Eric Blake
2019-Jun-12 21:00 UTC
[Libguestfs] [nbdkit PATCH v3 0/5] Play with libnbd for nbdkit-nbd
libnbd-0.1.4-1 is now available in Fedora 29/30 updates testing. Diffs since v2 - rebase to master, bump from libnbd 0.1.2 to 0.1.3+, add tests to TLS usage which flushed out the need to turn relative pathnames into absolute, doc tweaks Now that the testsuite covers TLS and libnbd has been fixed to provide the things I found lacking when developing v2, I'm leaning towards pushing this on Monday, or sooner if I get a positive review. Eric Blake (5): nbd: Check for libnbd nbd: s/nbd_/nbdplug_/ nbd: Use libnbd 0.1.3+ nbd: Add magic uri=... parameter nbd: Add TLS client support plugins/nbd/nbdkit-nbd-plugin.pod | 112 ++- configure.ac | 18 + plugins/nbd/nbd-standalone.c | 1369 +++++++++++++++++++++++++++++ plugins/nbd/nbd.c | 1364 +++++++++------------------- TODO | 13 +- plugins/nbd/Makefile.am | 25 +- tests/Makefile.am | 6 +- tests/test-nbd-tls-psk.sh | 81 ++ tests/test-nbd-tls.sh | 82 ++ tests/test-tls-psk.sh | 2 +- tests/test-tls.sh | 4 +- 11 files changed, 2101 insertions(+), 975 deletions(-) create mode 100644 plugins/nbd/nbd-standalone.c create mode 100755 tests/test-nbd-tls-psk.sh create mode 100755 tests/test-nbd-tls.sh -- 2.20.1
This patch merely adds the framework to compile the nbd plugin either as stanadlone (the version frozen in time to this commit) or as a client of the brand-new libnbd (a later patch will actually enable that part). Since libnbd does not yet have a stable API, falling back to the standalone version makes sense for a while longer. The configure check requires at least 0.1.3; at the moment, Fedora 29 has access to pre-built 0.1.4). Signed-off-by: Eric Blake <eblake@redhat.com> --- configure.ac | 19 + plugins/nbd/nbd-standalone.c | 1369 ++++++++++++++++++++++++++++++++++ plugins/nbd/Makefile.am | 17 +- 3 files changed, 1403 insertions(+), 2 deletions(-) create mode 100644 plugins/nbd/nbd-standalone.c diff --git a/configure.ac b/configure.ac index 2757630f..7d8fbd9f 100644 --- a/configure.ac +++ b/configure.ac @@ -711,6 +711,23 @@ AS_IF([test "$with_zlib" != "no"],[ ]) AM_CONDITIONAL([HAVE_ZLIB],[test "x$ZLIB_LIBS" != "x"]) +dnl Check for libnbd (only if you want to compile the full nbd plugin). +AC_ARG_WITH([libnbd], + [AS_HELP_STRING([--without-libnbd], + [disable nbd plugin @<:@default=check@:>@])], + [], + [with_libnbd=check]) +AS_IF([test "$with_libnbd" != "no"],[ + PKG_CHECK_MODULES([LIBNBD], [libnbd >= 0.1.3],[ + AC_SUBST([LIBNBD_CFLAGS]) + AC_SUBST([LIBNBD_LIBS]) + AC_DEFINE([HAVE_LIBNBD],[1],[libnbd found at compile time.]) + ], + [AC_MSG_WARN([libnbd >= 0.1.3 not found, nbd plugin will be crippled])]) +]) +#test "x$LIBNBD_LIBS" != "x" +AM_CONDITIONAL([HAVE_LIBNBD], [false]) + dnl Check for liblzma (only if you want to compile the xz filter). AC_ARG_WITH([liblzma], [AS_HELP_STRING([--without-liblzma], @@ -984,6 +1001,8 @@ feature "iso .................................... " \ test "x$HAVE_ISO_TRUE" = "x" feature "libvirt ................................ " \ test "x$HAVE_LIBVIRT_TRUE" = "x" +feature "nbd .................................... " \ + test "x$HAVE_LIBNBD_TRUE" = "x" feature "ssh .................................... " \ test "x$HAVE_SSH_TRUE" = "x" feature "tar .................................... " \ diff --git a/plugins/nbd/nbd-standalone.c b/plugins/nbd/nbd-standalone.c new file mode 100644 index 00000000..d176dd5f --- /dev/null +++ b/plugins/nbd/nbd-standalone.c @@ -0,0 +1,1369 @@ +/* nbdkit + * Copyright (C) 2017-2019 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdlib.h> +#include <stddef.h> +#include <stdbool.h> +#include <stdio.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> +#include <inttypes.h> +#include <limits.h> +#include <netdb.h> +#include <netinet/in.h> +#include <netinet/tcp.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <assert.h> +#include <pthread.h> +#include <semaphore.h> + +#define NBDKIT_API_VERSION 2 + +#include <nbdkit-plugin.h> +#include "protocol.h" +#include "byte-swapping.h" +#include "cleanup.h" + +/* The per-transaction details */ +struct transaction { + uint64_t cookie; + sem_t sem; + void *buf; + uint64_t offset; + uint32_t count; + uint32_t err; + struct nbdkit_extents *extents; + struct transaction *next; +}; + +/* The per-connection handle */ +struct handle { + /* These fields are read-only once initialized */ + int fd; + int flags; + int64_t size; + bool structured; + bool extents; + pthread_t reader; + + /* Prevents concurrent threads from interleaving writes to server */ + pthread_mutex_t write_lock; + + pthread_mutex_t trans_lock; /* Covers access to all fields below */ + struct transaction *trans; + uint64_t unique; + bool dead; +}; + +/* Connect to server via absolute name of Unix socket */ +static char *sockname; + +/* Connect to server via TCP socket */ +static const char *hostname; +static const char *port; + +/* Human-readable server description */ +static char *servname; + +/* Name of export on remote server, default '', ignored for oldstyle */ +static const char *export; + +/* Number of retries */ +static unsigned long retry; + +/* True to share single server connection among all clients */ +static bool shared; +static struct handle *shared_handle; + +static struct handle *nbd_open_handle (int readonly); +static void nbd_close_handle (struct handle *h); + +static void +nbd_unload (void) +{ + if (shared) + nbd_close_handle (shared_handle); + free (sockname); + free (servname); +} + +/* Called for each key=value passed on the command line. This plugin + * accepts socket=<sockname> or hostname=<hostname>/port=<port> + * (exactly one connection required), and optional parameters + * export=<name>, retry=<n> and shared=<bool>. + */ +static int +nbd_config (const char *key, const char *value) +{ + char *end; + int r; + + if (strcmp (key, "socket") == 0) { + /* See FILENAMES AND PATHS in nbdkit-plugin(3) */ + free (sockname); + sockname = nbdkit_absolute_path (value); + if (!sockname) + return -1; + } + else if (strcmp (key, "hostname") == 0) + hostname = value; + else if (strcmp (key, "port") == 0) + port = value; + else if (strcmp (key, "export") == 0) + export = value; + else if (strcmp (key, "retry") == 0) { + errno = 0; + retry = strtoul (value, &end, 0); + if (value == end || errno) { + nbdkit_error ("could not parse retry as integer (%s)", value); + return -1; + } + } + else if (strcmp (key, "shared") == 0) { + r = nbdkit_parse_bool (value); + if (r == -1) + return -1; + shared = r; + } + else { + nbdkit_error ("unknown parameter '%s'", key); + return -1; + } + + return 0; +} + +/* Check the user passed exactly one socket description. */ +static int +nbd_config_complete (void) +{ + int r; + + if (sockname) { + struct sockaddr_un sock; + + if (hostname || port) { + nbdkit_error ("cannot mix Unix socket and TCP hostname/port parameters"); + return -1; + } + if (strlen (sockname) > sizeof sock.sun_path) { + nbdkit_error ("socket file name too large"); + return -1; + } + servname = strdup (sockname); + } + else { + if (!hostname) { + nbdkit_error ("must supply socket= or hostname= of external NBD server"); + return -1; + } + if (!port) + port = "10809"; + if (strchr (hostname, ':')) + r = asprintf (&servname, "[%s]:%s", hostname, port); + else + r = asprintf (&servname, "%s:%s", hostname, port); + if (r < 0) { + nbdkit_error ("asprintf: %m"); + return -1; + } + } + + if (!export) + export = ""; + + if (shared && (shared_handle = nbd_open_handle (false)) == NULL) + return -1; + return 0; +} + +#define nbd_config_help \ + "socket=<SOCKNAME> The Unix socket to connect to.\n" \ + "hostname=<HOST> The hostname for the TCP socket to connect to.\n" \ + "port=<PORT> TCP port or service name to use (default 10809).\n" \ + "export=<NAME> Export name to connect to (default \"\").\n" \ + "retry=<N> Retry connection up to N seconds (default 0).\n" \ + "shared=<BOOL> True to share one server connection among all clients,\n" \ + " rather than a connection per client (default false).\n" \ + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL + +/* Read an entire buffer, returning 0 on success or -1 with errno set. */ +static int +read_full (int fd, void *buf, size_t len) +{ + ssize_t r; + + while (len) { + r = read (fd, buf, len); + if (r < 0) { + if (errno == EINTR || errno == EAGAIN) + continue; + return -1; + } + if (!r) { + /* Unexpected EOF */ + errno = EBADMSG; + return -1; + } + buf += r; + len -= r; + } + return 0; +} + +/* Write an entire buffer, returning 0 on success or -1 with errno set. */ +static int +write_full (int fd, const void *buf, size_t len) +{ + ssize_t r; + + while (len) { + r = write (fd, buf, len); + if (r < 0) { + if (errno == EINTR || errno == EAGAIN) + continue; + return -1; + } + buf += r; + len -= r; + } + return 0; +} + +/* Called during transmission phases when there is no hope of + * resynchronizing with the server, and all further requests from the + * client will fail. Returns -1 for convenience. */ +static int +nbd_mark_dead (struct handle *h) +{ + int err = errno; + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + if (!h->dead) { + nbdkit_debug ("permanent failure while talking to server %s: %m", + servname); + h->dead = true; + } + else if (!err) + errno = ESHUTDOWN; + /* NBD only accepts a limited set of errno values over the wire, and + nbdkit converts all other values to EINVAL. If we died due to an + errno value that cannot transmit over the wire, translate it to + ESHUTDOWN instead. */ + if (err == EPIPE || err == EBADMSG) + nbdkit_set_error (ESHUTDOWN); + return -1; +} + +/* Find and possibly remove the transaction corresponding to cookie + from the list. */ +static struct transaction * +find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) +{ + struct transaction **ptr; + struct transaction *trans; + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + ptr = &h->trans; + while ((trans = *ptr) != NULL) { + if (cookie == trans->cookie) + break; + ptr = &trans->next; + } + if (trans && remove) + *ptr = trans->next; + return trans; +} + +/* Send a request, return 0 on success or -1 on write failure. */ +static int +nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, uint64_t cookie, + const void *buf) +{ + struct request req = { + .magic = htobe32 (NBD_REQUEST_MAGIC), + .flags = htobe16 (flags), + .type = htobe16 (type), + .handle = cookie, /* Opaque to server, so endianness doesn't matter */ + .offset = htobe64 (offset), + .count = htobe32 (count), + }; + int r; + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->write_lock); + nbdkit_debug ("sending request type %d (%s), flags %#x, offset %#" PRIx64 + ", count %#x, cookie %#" PRIx64, type, name_of_nbd_cmd (type), + flags, offset, count, cookie); + r = write_full (h->fd, &req, sizeof req); + if (buf && !r) + r = write_full (h->fd, buf, count); + return r; +} + +/* Perform the request half of a transaction. On success, return the + transaction; on error return NULL. */ +static struct transaction * +nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, const void *req_buf, + void *rep_buf, struct nbdkit_extents *extents) +{ + int err; + struct transaction *trans; + uint64_t cookie; + + trans = calloc (1, sizeof *trans); + if (!trans) { + nbdkit_error ("unable to track transaction: %m"); + /* Still in sync with server, so don't mark connection dead */ + return NULL; + } + if (sem_init (&trans->sem, 0, 0)) { + nbdkit_error ("unable to create semaphore: %m"); + /* Still in sync with server, so don't mark connection dead */ + free (trans); + return NULL; + } + trans->buf = rep_buf; + trans->count = rep_buf ? count : 0; + trans->offset = offset; + trans->extents = extents; + { + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + if (h->dead) + goto err; + cookie = trans->cookie = h->unique++; + trans->next = h->trans; + h->trans = trans; + } + if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) + return trans; + trans = find_trans_by_cookie (h, cookie, true); + + err: + err = errno; + if (sem_destroy (&trans->sem)) + abort (); + free (trans); + nbd_mark_dead (h); + errno = err; + return NULL; +} + +/* Shorthand for nbd_request_full when no extra buffers are involved. */ +static struct transaction * +nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, + uint32_t count) +{ + return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL); +} + +/* Read a reply, and look up the corresponding transaction. + Return the server's non-negative answer (converted to local errno + value) on success, or -1 on read failure. If structured replies + were negotiated, trans_out is set to NULL if there are still more replies + expected. */ +static int +nbd_reply_raw (struct handle *h, struct transaction **trans_out) +{ + union { + struct simple_reply simple; + struct structured_reply structured; + } rep; + struct transaction *trans; + void *buf = NULL; + CLEANUP_FREE char *payload = NULL; + uint32_t count; + uint32_t id; + struct block_descriptor *extents = NULL; + size_t nextents = 0; + int error = NBD_SUCCESS; + bool more = false; + uint32_t len = 0; /* 0 except for structured reads */ + uint64_t offset = 0; /* if len, absolute offset of structured read chunk */ + bool zero = false; /* if len, whether to read or memset */ + uint16_t errlen; + + *trans_out = NULL; + /* magic and handle overlap between simple and structured replies */ + if (read_full (h->fd, &rep, sizeof rep.simple)) + return nbd_mark_dead (h); + rep.simple.magic = be32toh (rep.simple.magic); + switch (rep.simple.magic) { + case NBD_SIMPLE_REPLY_MAGIC: + nbdkit_debug ("received simple reply for cookie %#" PRIx64 ", status %s", + rep.simple.handle, + name_of_nbd_error (be32toh (rep.simple.error))); + error = be32toh (rep.simple.error); + break; + case NBD_STRUCTURED_REPLY_MAGIC: + if (!h->structured) { + nbdkit_error ("structured response without negotiation"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, sizeof rep.simple + (char *) &rep, + sizeof rep - sizeof rep.simple)) + return nbd_mark_dead (h); + rep.structured.flags = be16toh (rep.structured.flags); + rep.structured.type = be16toh (rep.structured.type); + rep.structured.length = be32toh (rep.structured.length); + nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 + ", payload length %" PRId32, + name_of_nbd_reply_type (rep.structured.type), + rep.structured.handle, rep.structured.length); + if (rep.structured.length > 64 * 1024 * 1024) { + nbdkit_error ("structured reply length is suspiciously large: %" PRId32, + rep.structured.length); + return nbd_mark_dead (h); + } + if (rep.structured.length) { + /* Special case for OFFSET_DATA in order to read tail of chunk + directly into final buffer later on */ + len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && + rep.structured.length > sizeof offset) ? sizeof offset : + rep.structured.length; + payload = malloc (len); + if (!payload) { + nbdkit_error ("reading structured reply payload: %m"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, payload, len)) + return nbd_mark_dead (h); + len = 0; + } + more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); + switch (rep.structured.type) { + case NBD_REPLY_TYPE_NONE: + if (rep.structured.length) { + nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); + return nbd_mark_dead (h); + } + if (more) { + nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); + return nbd_mark_dead (h); + } + break; + case NBD_REPLY_TYPE_OFFSET_DATA: + if (rep.structured.length <= sizeof offset) { + nbdkit_error ("structured reply OFFSET_DATA too small"); + return nbd_mark_dead (h); + } + memcpy (&offset, payload, sizeof offset); + offset = be64toh (offset); + len = rep.structured.length - sizeof offset; + break; + case NBD_REPLY_TYPE_OFFSET_HOLE: + if (rep.structured.length != sizeof offset + sizeof len) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&offset, payload, sizeof offset); + offset = be64toh (offset); + memcpy (&len, payload, sizeof len); + len = be32toh (len); + if (!len) { + nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); + return nbd_mark_dead (h); + } + zero = true; + break; + case NBD_REPLY_TYPE_BLOCK_STATUS: + if (!h->extents) { + nbdkit_error ("block status response without negotiation"); + return nbd_mark_dead (h); + } + if (rep.structured.length < sizeof *extents || + rep.structured.length % sizeof *extents != sizeof id) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + return nbd_mark_dead (h); + } + nextents = rep.structured.length / sizeof *extents; + extents = (struct block_descriptor *) &payload[sizeof id]; + memcpy (&id, payload, sizeof id); + id = be32toh (id); + nbdkit_debug ("parsing %zu extents for context id %" PRId32, + nextents, id); + break; + default: + if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { + nbdkit_error ("received unexpected structured reply %s", + name_of_nbd_reply_type (rep.structured.type)); + return nbd_mark_dead (h); + } + + if (rep.structured.length < sizeof error + sizeof errlen) { + nbdkit_error ("structured reply error size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&errlen, payload + sizeof error, sizeof errlen); + errlen = be16toh (errlen); + if (errlen > rep.structured.length - sizeof error - sizeof errlen) { + nbdkit_error ("structured reply error message size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&error, payload, sizeof error); + error = be32toh (error); + if (errlen) + nbdkit_debug ("received structured error %s with message: %.*s", + name_of_nbd_error (error), (int) errlen, + payload + sizeof error + sizeof errlen); + else + nbdkit_debug ("received structured error %s without message", + name_of_nbd_error (error)); + } + break; + + default: + nbdkit_error ("received unexpected magic in reply: %#" PRIx32, + rep.simple.magic); + return nbd_mark_dead (h); + } + + trans = find_trans_by_cookie (h, rep.simple.handle, !more); + if (!trans) { + nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); + return nbd_mark_dead (h); + } + + buf = trans->buf; + count = trans->count; + if (nextents) { + if (!trans->extents) { + nbdkit_error ("block status response to a non-status command"); + return nbd_mark_dead (h); + } + offset = trans->offset; + for (size_t i = 0; i < nextents; i++) { + /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ + if (nbdkit_add_extent (trans->extents, offset, + be32toh (extents[i].length), + be32toh (extents[i].status_flags)) == -1) { + error = errno; + break; + } + offset += be32toh (extents[i].length); + } + } + if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { + nbdkit_error ("simple read reply when structured was expected"); + return nbd_mark_dead (h); + } + if (len) { + if (!buf) { + nbdkit_error ("structured read response to a non-read command"); + return nbd_mark_dead (h); + } + if (offset < trans->offset || offset > INT64_MAX || + offset + len > trans->offset + count) { + nbdkit_error ("structured read reply with unexpected offset/length"); + return nbd_mark_dead (h); + } + buf = (char *) buf + offset - trans->offset; + if (zero) { + memset (buf, 0, len); + buf = NULL; + } + else + count = len; + } + + /* Thanks to structured replies, we must preserve an error in any + earlier chunk for replay during the final chunk. */ + if (!more) { + *trans_out = trans; + if (!error) + error = trans->err; + } + else if (error && !trans->err) + trans->err = error; + + /* Convert from wire value to local errno, and perform any final read */ + switch (error) { + case NBD_SUCCESS: + if (buf && read_full (h->fd, buf, count)) + return nbd_mark_dead (h); + return 0; + case NBD_EPERM: + return EPERM; + case NBD_EIO: + return EIO; + case NBD_ENOMEM: + return ENOMEM; + default: + nbdkit_debug ("unexpected error %d, squashing to EINVAL", error); + /* fallthrough */ + case NBD_EINVAL: + return EINVAL; + case NBD_ENOSPC: + return ENOSPC; + case NBD_EOVERFLOW: + return EOVERFLOW; + case NBD_ESHUTDOWN: + return ESHUTDOWN; + } +} + +/* Reader loop. */ +void * +nbd_reader (void *handle) +{ + struct handle *h = handle; + bool done = false; + int r; + + while (!done) { + struct transaction *trans; + + r = nbd_reply_raw (h, &trans); + if (r >= 0) { + if (!trans) + nbdkit_debug ("partial reply handled, waiting for final reply"); + else { + trans->err = r; + if (sem_post (&trans->sem)) { + nbdkit_error ("failed to post semaphore: %m"); + abort (); + } + } + } + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + done = h->dead; + } + + /* Clean up any stranded in-flight requests */ + r = ESHUTDOWN; + while (1) { + struct transaction *trans; + + { + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + trans = h->trans; + h->trans = trans ? trans->next : NULL; + } + if (!trans) + break; + trans->err = r; + if (sem_post (&trans->sem)) { + nbdkit_error ("failed to post semaphore: %m"); + abort (); + } + } + return NULL; +} + +/* Perform the reply half of a transaction. */ +static int +nbd_reply (struct handle *h, struct transaction *trans) +{ + int err; + + if (!trans) { + assert (errno); + return -1; + } + + while ((err = sem_wait (&trans->sem)) == -1 && errno == EINTR) + /* try again */; + if (err) { + nbdkit_debug ("failed to wait on semaphore: %m"); + err = EIO; + } + else + err = trans->err; + if (sem_destroy (&trans->sem)) + abort (); + free (trans); + errno = err; + return err ? -1 : 0; +} + +/* Receive response to @option into @reply, and consume any + payload. If @payload is non-NULL, caller must free *payload. Return + 0 on success, or -1 if communication to server is no longer + possible. */ +static int +nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, + struct fixed_new_option_reply *reply, + void **payload) +{ + CLEANUP_FREE char *buffer = NULL; + + if (payload) + *payload = NULL; + if (read_full (h->fd, reply, sizeof *reply)) { + nbdkit_error ("unable to read option reply: %m"); + return -1; + } + reply->magic = be64toh (reply->magic); + reply->option = be32toh (reply->option); + reply->reply = be32toh (reply->reply); + reply->replylen = be32toh (reply->replylen); + if (reply->magic != NBD_REP_MAGIC || reply->option != option) { + nbdkit_error ("unexpected option reply"); + return -1; + } + if (reply->replylen) { + if (reply->reply == NBD_REP_ACK) { + nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32, + reply->replylen); + return -1; + } + if (reply->replylen > 16 * 1024 * 1024) { + nbdkit_error ("option reply length is suspiciously large: %" PRId32, + reply->replylen); + return -1; + } + /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator + makes that string easier to use, without hurting other reply + types where buffer is not a string */ + buffer = malloc (reply->replylen + 1); + if (!buffer) { + nbdkit_error ("malloc: %m"); + return -1; + } + if (read_full (h->fd, buffer, reply->replylen)) { + nbdkit_error ("unable to read option reply payload: %m"); + return -1; + } + buffer[reply->replylen] = '\0'; + if (!payload) + nbdkit_debug ("ignoring option reply payload"); + else { + *payload = buffer; + buffer = NULL; + } + } + return 0; +} + +/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO. + Return 1 if haggling completed, 0 if haggling failed but + NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ +static int +nbd_newstyle_haggle (struct handle *h) +{ + const char *const query = "base:allocation"; + struct new_option opt; + uint32_t exportnamelen = htobe32 (strlen (export)); + uint32_t nrqueries = htobe32 (1); + uint32_t querylen = htobe32 (strlen (query)); + /* For now, we make no NBD_INFO_* requests, relying on the server to + send its defaults. TODO: nbdkit should let plugins report block + sizes, at which point we should request NBD_INFO_BLOCK_SIZE and + obey any sizes set by server. */ + uint16_t nrinfos = htobe16 (0); + struct fixed_new_option_reply reply; + + nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY); + opt.optlen = htobe32 (0); + if (write_full (h->fd, &opt, sizeof opt)) { + nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_ACK) { + nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); + h->structured = true; + + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrqueries + sizeof querylen + strlen (query)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrqueries, sizeof nrqueries) || + write_full (h->fd, &querylen, sizeof querylen) || + write_full (h->fd, query, strlen (query))) { + nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_META_CONTEXT) { + /* Cheat: we asked for exactly one context. We could double + check that the server is replying with exactly the + "base:allocation" context, and then remember the id it tells + us to later confirm that responses to NBD_CMD_BLOCK_STATUS + match up; but in the absence of multiple contexts, it's + easier to just assume the server is compliant, and will reuse + the same id, without bothering to check further. */ + nbdkit_debug ("extents enabled"); + h->extents = true; + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + } + if (reply.reply != NBD_REP_ACK) { + if (h->extents) { + nbdkit_error ("unexpected response to set meta context"); + return -1; + } + nbdkit_debug ("ignoring meta context response %s", + name_of_nbd_rep (reply.reply)); + } + } + else { + nbdkit_debug ("structured replies disabled"); + } + + /* Try NBD_OPT_GO */ + nbdkit_debug ("trying NBD_OPT_GO"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_GO); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrinfos); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrinfos, sizeof nrinfos)) { + nbdkit_error ("unable to request NBD_OPT_GO: %m"); + return -1; + } + while (1) { + CLEANUP_FREE void *buffer; + struct fixed_new_option_reply_info_export *reply_export; + uint16_t info; + + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) + return -1; + switch (reply.reply) { + case NBD_REP_INFO: + /* Parse payload, but ignore all except NBD_INFO_EXPORT */ + if (reply.replylen < 2) { + nbdkit_error ("NBD_REP_INFO reply too short"); + return -1; + } + memcpy (&info, buffer, sizeof info); + info = be16toh (info); + switch (info) { + case NBD_INFO_EXPORT: + if (reply.replylen != sizeof *reply_export) { + nbdkit_error ("NBD_INFO_EXPORT reply wrong size"); + return -1; + } + reply_export = buffer; + h->size = be64toh (reply_export->exportsize); + h->flags = be16toh (reply_export->eflags); + break; + default: + nbdkit_debug ("ignoring server info %d", info); + } + break; + case NBD_REP_ACK: + /* End of replies, valid if server already sent NBD_INFO_EXPORT, + observable since h->flags must contain NBD_FLAG_HAS_FLAGS */ + assert (!buffer); + if (!h->flags) { + nbdkit_error ("server omitted NBD_INFO_EXPORT reply to NBD_OPT_GO"); + return -1; + } + nbdkit_debug ("NBD_OPT_GO complete"); + return 1; + case NBD_REP_ERR_UNSUP: + /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("server lacks NBD_OPT_GO support"); + return 0; + default: + /* Unexpected. Either the server sent a legitimate error or an + unexpected reply, but either way, we can't connect. */ + if (NBD_REP_IS_ERR (reply.reply)) + if (reply.replylen) + nbdkit_error ("server rejected NBD_OPT_GO with %s: %s", + name_of_nbd_rep (reply.reply), (char *) buffer); + else + nbdkit_error ("server rejected NBD_OPT_GO with %s", + name_of_nbd_rep (reply.reply)); + else + nbdkit_error ("server used unexpected reply %s to NBD_OPT_GO", + name_of_nbd_rep (reply.reply)); + return -1; + } + } +} + +/* Connect to a Unix socket, returning the fd on success */ +static int +nbd_connect_unix (void) +{ + struct sockaddr_un sock = { .sun_family = AF_UNIX }; + int fd; + + nbdkit_debug ("connecting to Unix socket name=%s", sockname); + fd = socket (AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) { + nbdkit_error ("socket: %m"); + return -1; + } + + /* We already validated length during nbd_config_complete */ + assert (strlen (sockname) <= sizeof sock.sun_path); + memcpy (sock.sun_path, sockname, strlen (sockname)); + if (connect (fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { + nbdkit_error ("connect: %m"); + return -1; + } + return fd; +} + +/* Connect to a TCP socket, returning the fd on success */ +static int +nbd_connect_tcp (void) +{ + struct addrinfo hints = { .ai_family = AF_UNSPEC, + .ai_socktype = SOCK_STREAM, }; + struct addrinfo *result, *rp; + int r; + const int optval = 1; + int fd; + + nbdkit_debug ("connecting to TCP socket host=%s port=%s", hostname, port); + r = getaddrinfo (hostname, port, &hints, &result); + if (r != 0) { + nbdkit_error ("getaddrinfo: %s", gai_strerror (r)); + return -1; + } + + assert (result != NULL); + + for (rp = result; rp; rp = rp->ai_next) { + fd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol); + if (fd == -1) + continue; + if (connect (fd, rp->ai_addr, rp->ai_addrlen) != -1) + break; + close (fd); + } + freeaddrinfo (result); + if (rp == NULL) { + nbdkit_error ("connect: %m"); + close (fd); + return -1; + } + + if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval, + sizeof (int)) == -1) { + nbdkit_error ("cannot set TCP_NODELAY option: %m"); + close (fd); + return -1; + } + return fd; +} + +/* Create the shared or per-connection handle. */ +static struct handle * +nbd_open_handle (int readonly) +{ + struct handle *h; + struct old_handshake old; + uint64_t version; + + h = calloc (1, sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + retry: + if (sockname) + h->fd = nbd_connect_unix (); + else + h->fd = nbd_connect_tcp (); + if (h->fd == -1) { + if (retry--) { + sleep (1); + goto retry; + } + goto err; + } + + /* old and new handshake share same meaning of first 16 bytes */ + if (read_full (h->fd, &old, offsetof (struct old_handshake, exportsize))) { + nbdkit_error ("unable to read magic: %m"); + goto err; + } + if (strncmp (old.nbdmagic, "NBDMAGIC", sizeof old.nbdmagic)) { + nbdkit_error ("wrong magic, %s is not an NBD server", servname); + goto err; + } + version = be64toh (old.version); + if (version == OLD_VERSION) { + nbdkit_debug ("trying oldstyle connection"); + if (read_full (h->fd, + (char *) &old + offsetof (struct old_handshake, exportsize), + sizeof old - offsetof (struct old_handshake, exportsize))) { + nbdkit_error ("unable to read old handshake: %m"); + goto err; + } + h->size = be64toh (old.exportsize); + h->flags = be16toh (old.eflags); + } + else if (version == NEW_VERSION) { + uint16_t gflags; + uint32_t cflags; + struct new_option opt; + struct new_handshake_finish finish; + size_t expect; + + nbdkit_debug ("trying newstyle connection"); + if (read_full (h->fd, &gflags, sizeof gflags)) { + nbdkit_error ("unable to read global flags: %m"); + goto err; + } + gflags = be16toh (gflags); + cflags = htobe32 (gflags & (NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES)); + if (write_full (h->fd, &cflags, sizeof cflags)) { + nbdkit_error ("unable to return global flags: %m"); + goto err; + } + + /* Prefer NBD_OPT_GO if possible */ + if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { + int rc = nbd_newstyle_haggle (h); + if (rc < 0) + goto err; + if (!rc) + goto export_name; + } + else { + export_name: + /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("trying NBD_OPT_EXPORT_NAME"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_EXPORT_NAME); + opt.optlen = htobe32 (strlen (export)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, export, strlen (export))) { + nbdkit_error ("unable to request export '%s': %m", export); + goto err; + } + expect = sizeof finish; + if (gflags & NBD_FLAG_NO_ZEROES) + expect -= sizeof finish.zeroes; + if (read_full (h->fd, &finish, expect)) { + nbdkit_error ("unable to read new handshake: %m"); + goto err; + } + h->size = be64toh (finish.exportsize); + h->flags = be16toh (finish.eflags); + } + } + else { + nbdkit_error ("unexpected version %#" PRIx64, version); + goto err; + } + if (readonly) + h->flags |= NBD_FLAG_READ_ONLY; + + /* Spawn a dedicated reader thread */ + if ((errno = pthread_mutex_init (&h->write_lock, NULL))) { + nbdkit_error ("failed to initialize write mutex: %m"); + goto err; + } + if ((errno = pthread_mutex_init (&h->trans_lock, NULL))) { + nbdkit_error ("failed to initialize transaction mutex: %m"); + pthread_mutex_destroy (&h->write_lock); + goto err; + } + if ((errno = pthread_create (&h->reader, NULL, nbd_reader, h))) { + nbdkit_error ("failed to initialize reader thread: %m"); + pthread_mutex_destroy (&h->write_lock); + pthread_mutex_destroy (&h->trans_lock); + goto err; + } + + return h; + + err: + if (h->fd >= 0) + close (h->fd); + free (h); + return NULL; +} + +/* Create the per-connection handle. */ +static void * +nbd_open (int readonly) +{ + if (shared) + return shared_handle; + return nbd_open_handle (readonly); +} + +/* Free up the shared or per-connection handle. */ +static void +nbd_close_handle (struct handle *h) +{ + if (!h->dead) { + nbd_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); + shutdown (h->fd, SHUT_WR); + } + if ((errno = pthread_join (h->reader, NULL))) + nbdkit_debug ("failed to join reader thread: %m"); + close (h->fd); + pthread_mutex_destroy (&h->write_lock); + pthread_mutex_destroy (&h->trans_lock); + free (h); +} + +/* Free up the per-connection handle. */ +static void +nbd_close (void *handle) +{ + struct handle *h = handle; + + if (!shared) + nbd_close_handle (h); +} + +/* Get the file size. */ +static int64_t +nbd_get_size (void *handle) +{ + struct handle *h = handle; + + return h->size; +} + +static int +nbd_can_write (void *handle) +{ + struct handle *h = handle; + + return !(h->flags & NBD_FLAG_READ_ONLY); +} + +static int +nbd_can_flush (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_FLUSH; +} + +static int +nbd_is_rotational (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_ROTATIONAL; +} + +static int +nbd_can_trim (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_TRIM; +} + +static int +nbd_can_zero (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_WRITE_ZEROES; +} + +static int +nbd_can_fua (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_FUA ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; +} + +static int +nbd_can_multi_conn (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_CAN_MULTI_CONN; +} + +static int +nbd_can_cache (void *handle) +{ + struct handle *h = handle; + + if (h->flags & NBD_FLAG_SEND_CACHE) + return NBDKIT_CACHE_NATIVE; + return NBDKIT_CACHE_NONE; +} + +static int +nbd_can_extents (void *handle) +{ + struct handle *h = handle; + + return h->extents; +} + +/* Read data from the file. */ +static int +nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!flags); + s = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); + return nbd_reply (h, s); +} + +/* Write data to the file. */ +static int +nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, + uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!(flags & ~NBDKIT_FLAG_FUA)); + s = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_WRITE, offset, count, buf, NULL, NULL); + return nbd_reply (h, s); +} + +/* Write zeroes to the file. */ +static int +nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + int f = 0; + + assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); + assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); + + if (!(flags & NBDKIT_FLAG_MAY_TRIM)) + f |= NBD_CMD_FLAG_NO_HOLE; + if (flags & NBDKIT_FLAG_FUA) + f |= NBD_CMD_FLAG_FUA; + s = nbd_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); + return nbd_reply (h, s); +} + +/* Trim a portion of the file. */ +static int +nbd_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!(flags & ~NBDKIT_FLAG_FUA)); + s = nbd_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_TRIM, offset, count); + return nbd_reply (h, s); +} + +/* Flush the file to disk. */ +static int +nbd_flush (void *handle, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!flags); + s = nbd_request (h, 0, NBD_CMD_FLUSH, 0, 0); + return nbd_reply (h, s); +} + +/* Read extents of the file. */ +static int +nbd_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); + s = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0, + NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL, + extents); + return nbd_reply (h, s); +} + +/* Cache a portion of the file. */ +static int +nbd_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!flags); + s = nbd_request (h, 0, NBD_CMD_CACHE, offset, count); + return nbd_reply (h, s); +} + +static struct nbdkit_plugin plugin = { + .name = "nbd", + .longname = "nbdkit nbd plugin", + .version = PACKAGE_VERSION, + .unload = nbd_unload, + .config = nbd_config, + .config_complete = nbd_config_complete, + .config_help = nbd_config_help, + .open = nbd_open, + .close = nbd_close, + .get_size = nbd_get_size, + .can_write = nbd_can_write, + .can_flush = nbd_can_flush, + .is_rotational = nbd_is_rotational, + .can_trim = nbd_can_trim, + .can_zero = nbd_can_zero, + .can_fua = nbd_can_fua, + .can_multi_conn = nbd_can_multi_conn, + .can_extents = nbd_can_extents, + .can_cache = nbd_can_cache, + .pread = nbd_pread, + .pwrite = nbd_pwrite, + .zero = nbd_zero, + .flush = nbd_flush, + .trim = nbd_trim, + .extents = nbd_extents, + .cache = nbd_cache, + .errno_is_preserved = 1, +}; + +NBDKIT_REGISTER_PLUGIN (plugin) diff --git a/plugins/nbd/Makefile.am b/plugins/nbd/Makefile.am index 7368e591..bfc2a838 100644 --- a/plugins/nbd/Makefile.am +++ b/plugins/nbd/Makefile.am @@ -1,5 +1,5 @@ # nbdkit -# Copyright (C) 2017 Red Hat Inc. +# Copyright (C) 2017-2019 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -36,7 +36,6 @@ EXTRA_DIST = nbdkit-nbd-plugin.pod plugin_LTLIBRARIES = nbdkit-nbd-plugin.la nbdkit_nbd_plugin_la_SOURCES = \ - nbd.c \ $(top_srcdir)/include/nbdkit-plugin.h nbdkit_nbd_plugin_la_CPPFLAGS = \ @@ -54,6 +53,20 @@ nbdkit_nbd_plugin_la_LIBADD = \ $(top_builddir)/common/protocol/libprotocol.la \ $(top_builddir)/common/utils/libutils.la +# TODO: drop standalone version, which is locked at nbdkit 1.13.4 behavior, +# once libnbd is more commonly available with stable API. +if HAVE_LIBNBD +nbdkit_nbd_plugin_la_SOURCES += \ + nbd.c +nbdkit_nbd_plugin_la_CFLAGS += \ + $(LIBNBD_CFLAGS) +nbdkit_nbd_plugin_la_LIBADD += \ + $(LIBNBD_LIBS) +else !HAVE_LIBNBD +nbdkit_nbd_plugin_la_SOURCES += \ + nbd-standalone.c +endif !HAVE_LIBNBD + if HAVE_POD man_MANS = nbdkit-nbd-plugin.1 -- 2.20.1
libnbd uses the 'nbd_' prefix for its public functions, many of which clash with our internal uses. In order to compile with -lnbd, we need to rename our internal functions. No semantic change, although to test it, configure.ac now makes an actual decision on which of the two .c files to compile based on the presence of libnbd. Signed-off-by: Eric Blake <eblake@redhat.com> --- configure.ac | 3 +- plugins/nbd/nbd.c | 270 +++++++++++++++++++++++----------------------- 2 files changed, 136 insertions(+), 137 deletions(-) diff --git a/configure.ac b/configure.ac index 7d8fbd9f..07b282a4 100644 --- a/configure.ac +++ b/configure.ac @@ -725,8 +725,7 @@ AS_IF([test "$with_libnbd" != "no"],[ ], [AC_MSG_WARN([libnbd >= 0.1.3 not found, nbd plugin will be crippled])]) ]) -#test "x$LIBNBD_LIBS" != "x" -AM_CONDITIONAL([HAVE_LIBNBD], [false]) +AM_CONDITIONAL([HAVE_LIBNBD], [test "x$LIBNBD_LIBS" != "x"]) dnl Check for liblzma (only if you want to compile the xz filter). AC_ARG_WITH([liblzma], diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index d176dd5f..3c2622f1 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -108,14 +108,14 @@ static unsigned long retry; static bool shared; static struct handle *shared_handle; -static struct handle *nbd_open_handle (int readonly); -static void nbd_close_handle (struct handle *h); +static struct handle *nbdplug_open_handle (int readonly); +static void nbdplug_close_handle (struct handle *h); static void -nbd_unload (void) +nbdplug_unload (void) { if (shared) - nbd_close_handle (shared_handle); + nbdplug_close_handle (shared_handle); free (sockname); free (servname); } @@ -126,7 +126,7 @@ nbd_unload (void) * export=<name>, retry=<n> and shared=<bool>. */ static int -nbd_config (const char *key, const char *value) +nbdplug_config (const char *key, const char *value) { char *end; int r; @@ -168,7 +168,7 @@ nbd_config (const char *key, const char *value) /* Check the user passed exactly one socket description. */ static int -nbd_config_complete (void) +nbdplug_config_complete (void) { int r; @@ -205,12 +205,12 @@ nbd_config_complete (void) if (!export) export = ""; - if (shared && (shared_handle = nbd_open_handle (false)) == NULL) + if (shared && (shared_handle = nbdplug_open_handle (false)) == NULL) return -1; return 0; } -#define nbd_config_help \ +#define nbdplug_config_help \ "socket=<SOCKNAME> The Unix socket to connect to.\n" \ "hostname=<HOST> The hostname for the TCP socket to connect to.\n" \ "port=<PORT> TCP port or service name to use (default 10809).\n" \ @@ -268,7 +268,7 @@ write_full (int fd, const void *buf, size_t len) * resynchronizing with the server, and all further requests from the * client will fail. Returns -1 for convenience. */ static int -nbd_mark_dead (struct handle *h) +nbdplug_mark_dead (struct handle *h) { int err = errno; @@ -311,9 +311,9 @@ find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) /* Send a request, return 0 on success or -1 on write failure. */ static int -nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, uint64_t cookie, - const void *buf) +nbdplug_request_raw (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, uint64_t cookie, + const void *buf) { struct request req = { .magic = htobe32 (NBD_REQUEST_MAGIC), @@ -338,9 +338,9 @@ nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, /* Perform the request half of a transaction. On success, return the transaction; on error return NULL. */ static struct transaction * -nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, const void *req_buf, - void *rep_buf, struct nbdkit_extents *extents) +nbdplug_request_full (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, const void *req_buf, + void *rep_buf, struct nbdkit_extents *extents) { int err; struct transaction *trans; @@ -370,7 +370,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, trans->next = h->trans; h->trans = trans; } - if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) + if (nbdplug_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) return trans; trans = find_trans_by_cookie (h, cookie, true); @@ -379,17 +379,17 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, if (sem_destroy (&trans->sem)) abort (); free (trans); - nbd_mark_dead (h); + nbdplug_mark_dead (h); errno = err; return NULL; } -/* Shorthand for nbd_request_full when no extra buffers are involved. */ +/* Shorthand for nbdplug_request_full when no extra buffers are involved. */ static struct transaction * -nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, - uint32_t count) +nbdplug_request (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count) { - return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL); + return nbdplug_request_full (h, flags, type, offset, count, NULL, NULL, NULL); } /* Read a reply, and look up the corresponding transaction. @@ -398,7 +398,7 @@ nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, were negotiated, trans_out is set to NULL if there are still more replies expected. */ static int -nbd_reply_raw (struct handle *h, struct transaction **trans_out) +nbdplug_reply_raw (struct handle *h, struct transaction **trans_out) { union { struct simple_reply simple; @@ -421,7 +421,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) *trans_out = NULL; /* magic and handle overlap between simple and structured replies */ if (read_full (h->fd, &rep, sizeof rep.simple)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); rep.simple.magic = be32toh (rep.simple.magic); switch (rep.simple.magic) { case NBD_SIMPLE_REPLY_MAGIC: @@ -433,11 +433,11 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) case NBD_STRUCTURED_REPLY_MAGIC: if (!h->structured) { nbdkit_error ("structured response without negotiation"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (read_full (h->fd, sizeof rep.simple + (char *) &rep, sizeof rep - sizeof rep.simple)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); rep.structured.flags = be16toh (rep.structured.flags); rep.structured.type = be16toh (rep.structured.type); rep.structured.length = be32toh (rep.structured.length); @@ -448,7 +448,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) if (rep.structured.length > 64 * 1024 * 1024) { nbdkit_error ("structured reply length is suspiciously large: %" PRId32, rep.structured.length); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (rep.structured.length) { /* Special case for OFFSET_DATA in order to read tail of chunk @@ -459,10 +459,10 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) payload = malloc (len); if (!payload) { nbdkit_error ("reading structured reply payload: %m"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (read_full (h->fd, payload, len)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); len = 0; } more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); @@ -470,17 +470,17 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) case NBD_REPLY_TYPE_NONE: if (rep.structured.length) { nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (more) { nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } break; case NBD_REPLY_TYPE_OFFSET_DATA: if (rep.structured.length <= sizeof offset) { nbdkit_error ("structured reply OFFSET_DATA too small"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&offset, payload, sizeof offset); offset = be64toh (offset); @@ -489,7 +489,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) case NBD_REPLY_TYPE_OFFSET_HOLE: if (rep.structured.length != sizeof offset + sizeof len) { nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&offset, payload, sizeof offset); offset = be64toh (offset); @@ -497,19 +497,19 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) len = be32toh (len); if (!len) { nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } zero = true; break; case NBD_REPLY_TYPE_BLOCK_STATUS: if (!h->extents) { nbdkit_error ("block status response without negotiation"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (rep.structured.length < sizeof *extents || rep.structured.length % sizeof *extents != sizeof id) { nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } nextents = rep.structured.length / sizeof *extents; extents = (struct block_descriptor *) &payload[sizeof id]; @@ -522,18 +522,18 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { nbdkit_error ("received unexpected structured reply %s", name_of_nbd_reply_type (rep.structured.type)); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (rep.structured.length < sizeof error + sizeof errlen) { nbdkit_error ("structured reply error size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&errlen, payload + sizeof error, sizeof errlen); errlen = be16toh (errlen); if (errlen > rep.structured.length - sizeof error - sizeof errlen) { nbdkit_error ("structured reply error message size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&error, payload, sizeof error); error = be32toh (error); @@ -550,13 +550,13 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) default: nbdkit_error ("received unexpected magic in reply: %#" PRIx32, rep.simple.magic); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } trans = find_trans_by_cookie (h, rep.simple.handle, !more); if (!trans) { nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } buf = trans->buf; @@ -564,7 +564,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) if (nextents) { if (!trans->extents) { nbdkit_error ("block status response to a non-status command"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } offset = trans->offset; for (size_t i = 0; i < nextents; i++) { @@ -580,17 +580,17 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) } if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { nbdkit_error ("simple read reply when structured was expected"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (len) { if (!buf) { nbdkit_error ("structured read response to a non-read command"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (offset < trans->offset || offset > INT64_MAX || offset + len > trans->offset + count) { nbdkit_error ("structured read reply with unexpected offset/length"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } buf = (char *) buf + offset - trans->offset; if (zero) { @@ -615,7 +615,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) switch (error) { case NBD_SUCCESS: if (buf && read_full (h->fd, buf, count)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); return 0; case NBD_EPERM: return EPERM; @@ -639,7 +639,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) /* Reader loop. */ void * -nbd_reader (void *handle) +nbdplug_reader (void *handle) { struct handle *h = handle; bool done = false; @@ -648,7 +648,7 @@ nbd_reader (void *handle) while (!done) { struct transaction *trans; - r = nbd_reply_raw (h, &trans); + r = nbdplug_reply_raw (h, &trans); if (r >= 0) { if (!trans) nbdkit_debug ("partial reply handled, waiting for final reply"); @@ -687,7 +687,7 @@ nbd_reader (void *handle) /* Perform the reply half of a transaction. */ static int -nbd_reply (struct handle *h, struct transaction *trans) +nbdplug_reply (struct handle *h, struct transaction *trans) { int err; @@ -716,9 +716,9 @@ nbd_reply (struct handle *h, struct transaction *trans) 0 on success, or -1 if communication to server is no longer possible. */ static int -nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, - struct fixed_new_option_reply *reply, - void **payload) +nbdplug_newstyle_recv_option_reply (struct handle *h, uint32_t option, + struct fixed_new_option_reply *reply, + void **payload) { CLEANUP_FREE char *buffer = NULL; @@ -774,7 +774,7 @@ nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, Return 1 if haggling completed, 0 if haggling failed but NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ static int -nbd_newstyle_haggle (struct handle *h) +nbdplug_newstyle_haggle (struct handle *h) { const char *const query = "base:allocation"; struct new_option opt; @@ -796,8 +796,8 @@ nbd_newstyle_haggle (struct handle *h) nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); return -1; } - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, - NULL) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, + NULL) < 0) return -1; if (reply.reply == NBD_REP_ACK) { nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); @@ -816,8 +816,8 @@ nbd_newstyle_haggle (struct handle *h) nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); return -1; } - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, - NULL) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) return -1; if (reply.reply == NBD_REP_META_CONTEXT) { /* Cheat: we asked for exactly one context. We could double @@ -829,8 +829,8 @@ nbd_newstyle_haggle (struct handle *h) the same id, without bothering to check further. */ nbdkit_debug ("extents enabled"); h->extents = true; - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, - NULL) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, + &reply, NULL) < 0) return -1; } if (reply.reply != NBD_REP_ACK) { @@ -864,7 +864,7 @@ nbd_newstyle_haggle (struct handle *h) struct fixed_new_option_reply_info_export *reply_export; uint16_t info; - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) return -1; switch (reply.reply) { case NBD_REP_INFO: @@ -923,7 +923,7 @@ nbd_newstyle_haggle (struct handle *h) /* Connect to a Unix socket, returning the fd on success */ static int -nbd_connect_unix (void) +nbdplug_connect_unix (void) { struct sockaddr_un sock = { .sun_family = AF_UNIX }; int fd; @@ -935,7 +935,7 @@ nbd_connect_unix (void) return -1; } - /* We already validated length during nbd_config_complete */ + /* We already validated length during nbdplug_config_complete */ assert (strlen (sockname) <= sizeof sock.sun_path); memcpy (sock.sun_path, sockname, strlen (sockname)); if (connect (fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { @@ -947,7 +947,7 @@ nbd_connect_unix (void) /* Connect to a TCP socket, returning the fd on success */ static int -nbd_connect_tcp (void) +nbdplug_connect_tcp (void) { struct addrinfo hints = { .ai_family = AF_UNSPEC, .ai_socktype = SOCK_STREAM, }; @@ -991,7 +991,7 @@ nbd_connect_tcp (void) /* Create the shared or per-connection handle. */ static struct handle * -nbd_open_handle (int readonly) +nbdplug_open_handle (int readonly) { struct handle *h; struct old_handshake old; @@ -1005,9 +1005,9 @@ nbd_open_handle (int readonly) retry: if (sockname) - h->fd = nbd_connect_unix (); + h->fd = nbdplug_connect_unix (); else - h->fd = nbd_connect_tcp (); + h->fd = nbdplug_connect_tcp (); if (h->fd == -1) { if (retry--) { sleep (1); @@ -1058,7 +1058,7 @@ nbd_open_handle (int readonly) /* Prefer NBD_OPT_GO if possible */ if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { - int rc = nbd_newstyle_haggle (h); + int rc = nbdplug_newstyle_haggle (h); if (rc < 0) goto err; if (!rc) @@ -1104,7 +1104,7 @@ nbd_open_handle (int readonly) pthread_mutex_destroy (&h->write_lock); goto err; } - if ((errno = pthread_create (&h->reader, NULL, nbd_reader, h))) { + if ((errno = pthread_create (&h->reader, NULL, nbdplug_reader, h))) { nbdkit_error ("failed to initialize reader thread: %m"); pthread_mutex_destroy (&h->write_lock); pthread_mutex_destroy (&h->trans_lock); @@ -1122,19 +1122,19 @@ nbd_open_handle (int readonly) /* Create the per-connection handle. */ static void * -nbd_open (int readonly) +nbdplug_open (int readonly) { if (shared) return shared_handle; - return nbd_open_handle (readonly); + return nbdplug_open_handle (readonly); } /* Free up the shared or per-connection handle. */ static void -nbd_close_handle (struct handle *h) +nbdplug_close_handle (struct handle *h) { if (!h->dead) { - nbd_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); + nbdplug_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); shutdown (h->fd, SHUT_WR); } if ((errno = pthread_join (h->reader, NULL))) @@ -1147,17 +1147,17 @@ nbd_close_handle (struct handle *h) /* Free up the per-connection handle. */ static void -nbd_close (void *handle) +nbdplug_close (void *handle) { struct handle *h = handle; if (!shared) - nbd_close_handle (h); + nbdplug_close_handle (h); } /* Get the file size. */ static int64_t -nbd_get_size (void *handle) +nbdplug_get_size (void *handle) { struct handle *h = handle; @@ -1165,7 +1165,7 @@ nbd_get_size (void *handle) } static int -nbd_can_write (void *handle) +nbdplug_can_write (void *handle) { struct handle *h = handle; @@ -1173,7 +1173,7 @@ nbd_can_write (void *handle) } static int -nbd_can_flush (void *handle) +nbdplug_can_flush (void *handle) { struct handle *h = handle; @@ -1181,7 +1181,7 @@ nbd_can_flush (void *handle) } static int -nbd_is_rotational (void *handle) +nbdplug_is_rotational (void *handle) { struct handle *h = handle; @@ -1189,7 +1189,7 @@ nbd_is_rotational (void *handle) } static int -nbd_can_trim (void *handle) +nbdplug_can_trim (void *handle) { struct handle *h = handle; @@ -1197,7 +1197,7 @@ nbd_can_trim (void *handle) } static int -nbd_can_zero (void *handle) +nbdplug_can_zero (void *handle) { struct handle *h = handle; @@ -1205,7 +1205,7 @@ nbd_can_zero (void *handle) } static int -nbd_can_fua (void *handle) +nbdplug_can_fua (void *handle) { struct handle *h = handle; @@ -1213,7 +1213,7 @@ nbd_can_fua (void *handle) } static int -nbd_can_multi_conn (void *handle) +nbdplug_can_multi_conn (void *handle) { struct handle *h = handle; @@ -1221,7 +1221,7 @@ nbd_can_multi_conn (void *handle) } static int -nbd_can_cache (void *handle) +nbdplug_can_cache (void *handle) { struct handle *h = handle; @@ -1231,7 +1231,7 @@ nbd_can_cache (void *handle) } static int -nbd_can_extents (void *handle) +nbdplug_can_extents (void *handle) { struct handle *h = handle; @@ -1240,38 +1240,38 @@ nbd_can_extents (void *handle) /* Read data from the file. */ static int -nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, - uint32_t flags) +nbdplug_pread (void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!flags); - s = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); - return nbd_reply (h, s); + s = nbdplug_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); + return nbdplug_reply (h, s); } /* Write data to the file. */ static int -nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, - uint32_t flags) +nbdplug_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, + uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_WRITE, offset, count, buf, NULL, NULL); - return nbd_reply (h, s); + s = nbdplug_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_WRITE, offset, count, buf, NULL, NULL); + return nbdplug_reply (h, s); } /* Write zeroes to the file. */ static int -nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +nbdplug_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; - int f = 0; + uint32_t f = 0; assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); @@ -1280,89 +1280,89 @@ nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) f |= NBD_CMD_FLAG_NO_HOLE; if (flags & NBDKIT_FLAG_FUA) f |= NBD_CMD_FLAG_FUA; - s = nbd_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); - return nbd_reply (h, s); + s = nbdplug_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); + return nbdplug_reply (h, s); } /* Trim a portion of the file. */ static int -nbd_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +nbdplug_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbd_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_TRIM, offset, count); - return nbd_reply (h, s); + s = nbdplug_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_TRIM, offset, count); + return nbdplug_reply (h, s); } /* Flush the file to disk. */ static int -nbd_flush (void *handle, uint32_t flags) +nbdplug_flush (void *handle, uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!flags); - s = nbd_request (h, 0, NBD_CMD_FLUSH, 0, 0); - return nbd_reply (h, s); + s = nbdplug_request (h, 0, NBD_CMD_FLUSH, 0, 0); + return nbdplug_reply (h, s); } /* Read extents of the file. */ static int -nbd_extents (void *handle, uint32_t count, uint64_t offset, - uint32_t flags, struct nbdkit_extents *extents) +nbdplug_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) { struct handle *h = handle; struct transaction *s; + uint32_t f = flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0; assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); - s = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0, - NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL, - extents); - return nbd_reply (h, s); + s = nbdplug_request_full (h, f, NBD_CMD_BLOCK_STATUS, offset, count, NULL, + NULL, extents); + return nbdplug_reply (h, s); } /* Cache a portion of the file. */ static int -nbd_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +nbdplug_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!flags); - s = nbd_request (h, 0, NBD_CMD_CACHE, offset, count); - return nbd_reply (h, s); + s = nbdplug_request (h, 0, NBD_CMD_CACHE, offset, count); + return nbdplug_reply (h, s); } static struct nbdkit_plugin plugin = { .name = "nbd", .longname = "nbdkit nbd plugin", .version = PACKAGE_VERSION, - .unload = nbd_unload, - .config = nbd_config, - .config_complete = nbd_config_complete, - .config_help = nbd_config_help, - .open = nbd_open, - .close = nbd_close, - .get_size = nbd_get_size, - .can_write = nbd_can_write, - .can_flush = nbd_can_flush, - .is_rotational = nbd_is_rotational, - .can_trim = nbd_can_trim, - .can_zero = nbd_can_zero, - .can_fua = nbd_can_fua, - .can_multi_conn = nbd_can_multi_conn, - .can_extents = nbd_can_extents, - .can_cache = nbd_can_cache, - .pread = nbd_pread, - .pwrite = nbd_pwrite, - .zero = nbd_zero, - .flush = nbd_flush, - .trim = nbd_trim, - .extents = nbd_extents, - .cache = nbd_cache, + .unload = nbdplug_unload, + .config = nbdplug_config, + .config_complete = nbdplug_config_complete, + .config_help = nbdplug_config_help, + .open = nbdplug_open, + .close = nbdplug_close, + .get_size = nbdplug_get_size, + .can_write = nbdplug_can_write, + .can_flush = nbdplug_can_flush, + .is_rotational = nbdplug_is_rotational, + .can_trim = nbdplug_can_trim, + .can_zero = nbdplug_can_zero, + .can_fua = nbdplug_can_fua, + .can_multi_conn = nbdplug_can_multi_conn, + .can_extents = nbdplug_can_extents, + .can_cache = nbdplug_can_cache, + .pread = nbdplug_pread, + .pwrite = nbdplug_pwrite, + .zero = nbdplug_zero, + .flush = nbdplug_flush, + .trim = nbdplug_trim, + .extents = nbdplug_extents, + .cache = nbdplug_cache, .errno_is_preserved = 1, }; -- 2.20.1
Eric Blake
2019-Jun-12 21:00 UTC
[Libguestfs] [nbdkit PATCH v3 3/5] nbd: Use libnbd 0.1.3+
This conversion should be feature compatible with the standalone nbd code. Note that the use of libnbd makes the binary for this particular plugin fall under an LGPLv2+ license rather than BSD; but the source code in nbd.c remains BSD. A lot of code simply disappears, now that I'm no longer directly utilizing the NBD protocol files but relying on libnbd. Coordination between threads from nbdkit and a central state machine thread is gated by a lock around the pending transition list, using the same trick as before of using a semaphore to wake up the right thread after the server's reply, regardless of out-of-order handling from the server. Benchmark-wise, using the same setup as in commit e897ed70, I see a slight slowdown: Pre-patch, the runs averaged 0.754s, 2.17E+08 bits/s Post-patch, the runs averaged 0.849s, 1.93E+07 bits/s The explanation is that pre-patch, one nbdkit thread can be write()ing the next request to the server at the same time the reader thread is read()ing the previous reply; post-patch, libnbd's current structure handles all I/O under a lock such that there is no simultaneous bidirectional traffic. There are discussions about adding future libnbd APIs to address this limitation. The trickiest part (for me) was the fact that since the state machine loop is in a separate thread from the initial requests, the loop is often blocked on just POLLIN for the state machine fd. My initial attempt tried to grab the transaction lock before calling nbd_aio_FOO to create the cookie, and generally worked where the server's reply would then wake up the reader. But that was even slower (due to a larger lock section), and also hit a problem where large NBD_CMD_WRITE would block on writes, which the poll() was not expecting (best seen when running tests/test-nozero.sh with LIBNBD_DEBUG=1). I tried blindly setting POLLOUT as a valid reason to break the poll without regard to the current state, but that burned a lot of CPU as it fired even when there was no progress. So I finally settled on using a pipe-to-self and poll()ing on two fds at once (the libnbd state machine and my pipe) to know when the loop needed to start processing because another thread started a command, where the pipe-to-self also makes it possible to not need to hold the transaction lock while generating a cookie. Signed-off-by: Eric Blake <eblake@redhat.com> --- Shoot - I need to rerun the benchmark numbers. I'll follow up once I've done that, but it is just an amended commit. plugins/nbd/nbd.c | 1100 +++++++++------------------------------ plugins/nbd/Makefile.am | 8 +- 2 files changed, 244 insertions(+), 864 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 3c2622f1..5d79c981 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -49,43 +49,35 @@ #include <assert.h> #include <pthread.h> #include <semaphore.h> +#include <poll.h> + +#include <libnbd.h> #define NBDKIT_API_VERSION 2 #include <nbdkit-plugin.h> -#include "protocol.h" #include "byte-swapping.h" #include "cleanup.h" /* The per-transaction details */ struct transaction { - uint64_t cookie; + int64_t cookie; sem_t sem; - void *buf; - uint64_t offset; - uint32_t count; uint32_t err; - struct nbdkit_extents *extents; struct transaction *next; }; /* The per-connection handle */ struct handle { /* These fields are read-only once initialized */ - int fd; - int flags; - int64_t size; - bool structured; - bool extents; + struct nbd_handle *nbd; + int fd; /* Cache of nbd_aio_get_fd */ + int fds[2]; /* Pipe for kicking the reader thread */ + bool readonly; pthread_t reader; - /* Prevents concurrent threads from interleaving writes to server */ - pthread_mutex_t write_lock; - - pthread_mutex_t trans_lock; /* Covers access to all fields below */ - struct transaction *trans; - uint64_t unique; - bool dead; + pthread_mutex_t trans_lock; /* Covers access to trans list */ + struct transaction *trans; /* List of pending transactions */ }; /* Connect to server via absolute name of Unix socket */ @@ -95,9 +87,6 @@ static char *sockname; static const char *hostname; static const char *port; -/* Human-readable server description */ -static char *servname; - /* Name of export on remote server, default '', ignored for oldstyle */ static const char *export; @@ -117,7 +106,6 @@ nbdplug_unload (void) if (shared) nbdplug_close_handle (shared_handle); free (sockname); - free (servname); } /* Called for each key=value passed on the command line. This plugin @@ -170,8 +158,6 @@ nbdplug_config (const char *key, const char *value) static int nbdplug_config_complete (void) { - int r; - if (sockname) { struct sockaddr_un sock; @@ -183,7 +169,6 @@ nbdplug_config_complete (void) nbdkit_error ("socket file name too large"); return -1; } - servname = strdup (sockname); } else { if (!hostname) { @@ -192,14 +177,6 @@ nbdplug_config_complete (void) } if (!port) port = "10809"; - if (strchr (hostname, ':')) - r = asprintf (&servname, "[%s]:%s", hostname, port); - else - r = asprintf (&servname, "%s:%s", hostname, port); - if (r < 0) { - nbdkit_error ("asprintf: %m"); - return -1; - } } if (!export) @@ -221,451 +198,74 @@ nbdplug_config_complete (void) #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL -/* Read an entire buffer, returning 0 on success or -1 with errno set. */ -static int -read_full (int fd, void *buf, size_t len) -{ - ssize_t r; - - while (len) { - r = read (fd, buf, len); - if (r < 0) { - if (errno == EINTR || errno == EAGAIN) - continue; - return -1; - } - if (!r) { - /* Unexpected EOF */ - errno = EBADMSG; - return -1; - } - buf += r; - len -= r; - } - return 0; -} - -/* Write an entire buffer, returning 0 on success or -1 with errno set. */ -static int -write_full (int fd, const void *buf, size_t len) -{ - ssize_t r; - - while (len) { - r = write (fd, buf, len); - if (r < 0) { - if (errno == EINTR || errno == EAGAIN) - continue; - return -1; - } - buf += r; - len -= r; - } - return 0; -} - -/* Called during transmission phases when there is no hope of - * resynchronizing with the server, and all further requests from the - * client will fail. Returns -1 for convenience. */ -static int -nbdplug_mark_dead (struct handle *h) -{ - int err = errno; - - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - if (!h->dead) { - nbdkit_debug ("permanent failure while talking to server %s: %m", - servname); - h->dead = true; - } - else if (!err) - errno = ESHUTDOWN; - /* NBD only accepts a limited set of errno values over the wire, and - nbdkit converts all other values to EINVAL. If we died due to an - errno value that cannot transmit over the wire, translate it to - ESHUTDOWN instead. */ - if (err == EPIPE || err == EBADMSG) - nbdkit_set_error (ESHUTDOWN); - return -1; -} - -/* Find and possibly remove the transaction corresponding to cookie - from the list. */ -static struct transaction * -find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) -{ - struct transaction **ptr; - struct transaction *trans; - - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - ptr = &h->trans; - while ((trans = *ptr) != NULL) { - if (cookie == trans->cookie) - break; - ptr = &trans->next; - } - if (trans && remove) - *ptr = trans->next; - return trans; -} - -/* Send a request, return 0 on success or -1 on write failure. */ -static int -nbdplug_request_raw (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, uint64_t cookie, - const void *buf) -{ - struct request req = { - .magic = htobe32 (NBD_REQUEST_MAGIC), - .flags = htobe16 (flags), - .type = htobe16 (type), - .handle = cookie, /* Opaque to server, so endianness doesn't matter */ - .offset = htobe64 (offset), - .count = htobe32 (count), - }; - int r; - - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->write_lock); - nbdkit_debug ("sending request type %d (%s), flags %#x, offset %#" PRIx64 - ", count %#x, cookie %#" PRIx64, type, name_of_nbd_cmd (type), - flags, offset, count, cookie); - r = write_full (h->fd, &req, sizeof req); - if (buf && !r) - r = write_full (h->fd, buf, count); - return r; -} - -/* Perform the request half of a transaction. On success, return the - transaction; on error return NULL. */ -static struct transaction * -nbdplug_request_full (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, const void *req_buf, - void *rep_buf, struct nbdkit_extents *extents) -{ - int err; - struct transaction *trans; - uint64_t cookie; - - trans = calloc (1, sizeof *trans); - if (!trans) { - nbdkit_error ("unable to track transaction: %m"); - /* Still in sync with server, so don't mark connection dead */ - return NULL; - } - if (sem_init (&trans->sem, 0, 0)) { - nbdkit_error ("unable to create semaphore: %m"); - /* Still in sync with server, so don't mark connection dead */ - free (trans); - return NULL; - } - trans->buf = rep_buf; - trans->count = rep_buf ? count : 0; - trans->offset = offset; - trans->extents = extents; - { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - if (h->dead) - goto err; - cookie = trans->cookie = h->unique++; - trans->next = h->trans; - h->trans = trans; - } - if (nbdplug_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) - return trans; - trans = find_trans_by_cookie (h, cookie, true); - - err: - err = errno; - if (sem_destroy (&trans->sem)) - abort (); - free (trans); - nbdplug_mark_dead (h); - errno = err; - return NULL; -} - -/* Shorthand for nbdplug_request_full when no extra buffers are involved. */ -static struct transaction * -nbdplug_request (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count) -{ - return nbdplug_request_full (h, flags, type, offset, count, NULL, NULL, NULL); -} - -/* Read a reply, and look up the corresponding transaction. - Return the server's non-negative answer (converted to local errno - value) on success, or -1 on read failure. If structured replies - were negotiated, trans_out is set to NULL if there are still more replies - expected. */ -static int -nbdplug_reply_raw (struct handle *h, struct transaction **trans_out) -{ - union { - struct simple_reply simple; - struct structured_reply structured; - } rep; - struct transaction *trans; - void *buf = NULL; - CLEANUP_FREE char *payload = NULL; - uint32_t count; - uint32_t id; - struct block_descriptor *extents = NULL; - size_t nextents = 0; - int error = NBD_SUCCESS; - bool more = false; - uint32_t len = 0; /* 0 except for structured reads */ - uint64_t offset = 0; /* if len, absolute offset of structured read chunk */ - bool zero = false; /* if len, whether to read or memset */ - uint16_t errlen; - - *trans_out = NULL; - /* magic and handle overlap between simple and structured replies */ - if (read_full (h->fd, &rep, sizeof rep.simple)) - return nbdplug_mark_dead (h); - rep.simple.magic = be32toh (rep.simple.magic); - switch (rep.simple.magic) { - case NBD_SIMPLE_REPLY_MAGIC: - nbdkit_debug ("received simple reply for cookie %#" PRIx64 ", status %s", - rep.simple.handle, - name_of_nbd_error (be32toh (rep.simple.error))); - error = be32toh (rep.simple.error); - break; - case NBD_STRUCTURED_REPLY_MAGIC: - if (!h->structured) { - nbdkit_error ("structured response without negotiation"); - return nbdplug_mark_dead (h); - } - if (read_full (h->fd, sizeof rep.simple + (char *) &rep, - sizeof rep - sizeof rep.simple)) - return nbdplug_mark_dead (h); - rep.structured.flags = be16toh (rep.structured.flags); - rep.structured.type = be16toh (rep.structured.type); - rep.structured.length = be32toh (rep.structured.length); - nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 - ", payload length %" PRId32, - name_of_nbd_reply_type (rep.structured.type), - rep.structured.handle, rep.structured.length); - if (rep.structured.length > 64 * 1024 * 1024) { - nbdkit_error ("structured reply length is suspiciously large: %" PRId32, - rep.structured.length); - return nbdplug_mark_dead (h); - } - if (rep.structured.length) { - /* Special case for OFFSET_DATA in order to read tail of chunk - directly into final buffer later on */ - len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && - rep.structured.length > sizeof offset) ? sizeof offset : - rep.structured.length; - payload = malloc (len); - if (!payload) { - nbdkit_error ("reading structured reply payload: %m"); - return nbdplug_mark_dead (h); - } - if (read_full (h->fd, payload, len)) - return nbdplug_mark_dead (h); - len = 0; - } - more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); - switch (rep.structured.type) { - case NBD_REPLY_TYPE_NONE: - if (rep.structured.length) { - nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); - return nbdplug_mark_dead (h); - } - if (more) { - nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); - return nbdplug_mark_dead (h); - } - break; - case NBD_REPLY_TYPE_OFFSET_DATA: - if (rep.structured.length <= sizeof offset) { - nbdkit_error ("structured reply OFFSET_DATA too small"); - return nbdplug_mark_dead (h); - } - memcpy (&offset, payload, sizeof offset); - offset = be64toh (offset); - len = rep.structured.length - sizeof offset; - break; - case NBD_REPLY_TYPE_OFFSET_HOLE: - if (rep.structured.length != sizeof offset + sizeof len) { - nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbdplug_mark_dead (h); - } - memcpy (&offset, payload, sizeof offset); - offset = be64toh (offset); - memcpy (&len, payload, sizeof len); - len = be32toh (len); - if (!len) { - nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); - return nbdplug_mark_dead (h); - } - zero = true; - break; - case NBD_REPLY_TYPE_BLOCK_STATUS: - if (!h->extents) { - nbdkit_error ("block status response without negotiation"); - return nbdplug_mark_dead (h); - } - if (rep.structured.length < sizeof *extents || - rep.structured.length % sizeof *extents != sizeof id) { - nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbdplug_mark_dead (h); - } - nextents = rep.structured.length / sizeof *extents; - extents = (struct block_descriptor *) &payload[sizeof id]; - memcpy (&id, payload, sizeof id); - id = be32toh (id); - nbdkit_debug ("parsing %zu extents for context id %" PRId32, - nextents, id); - break; - default: - if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { - nbdkit_error ("received unexpected structured reply %s", - name_of_nbd_reply_type (rep.structured.type)); - return nbdplug_mark_dead (h); - } - - if (rep.structured.length < sizeof error + sizeof errlen) { - nbdkit_error ("structured reply error size incorrect"); - return nbdplug_mark_dead (h); - } - memcpy (&errlen, payload + sizeof error, sizeof errlen); - errlen = be16toh (errlen); - if (errlen > rep.structured.length - sizeof error - sizeof errlen) { - nbdkit_error ("structured reply error message size incorrect"); - return nbdplug_mark_dead (h); - } - memcpy (&error, payload, sizeof error); - error = be32toh (error); - if (errlen) - nbdkit_debug ("received structured error %s with message: %.*s", - name_of_nbd_error (error), (int) errlen, - payload + sizeof error + sizeof errlen); - else - nbdkit_debug ("received structured error %s without message", - name_of_nbd_error (error)); - } - break; - - default: - nbdkit_error ("received unexpected magic in reply: %#" PRIx32, - rep.simple.magic); - return nbdplug_mark_dead (h); - } - - trans = find_trans_by_cookie (h, rep.simple.handle, !more); - if (!trans) { - nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); - return nbdplug_mark_dead (h); - } - - buf = trans->buf; - count = trans->count; - if (nextents) { - if (!trans->extents) { - nbdkit_error ("block status response to a non-status command"); - return nbdplug_mark_dead (h); - } - offset = trans->offset; - for (size_t i = 0; i < nextents; i++) { - /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ - if (nbdkit_add_extent (trans->extents, offset, - be32toh (extents[i].length), - be32toh (extents[i].status_flags)) == -1) { - error = errno; - break; - } - offset += be32toh (extents[i].length); - } - } - if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { - nbdkit_error ("simple read reply when structured was expected"); - return nbdplug_mark_dead (h); - } - if (len) { - if (!buf) { - nbdkit_error ("structured read response to a non-read command"); - return nbdplug_mark_dead (h); - } - if (offset < trans->offset || offset > INT64_MAX || - offset + len > trans->offset + count) { - nbdkit_error ("structured read reply with unexpected offset/length"); - return nbdplug_mark_dead (h); - } - buf = (char *) buf + offset - trans->offset; - if (zero) { - memset (buf, 0, len); - buf = NULL; - } - else - count = len; - } - - /* Thanks to structured replies, we must preserve an error in any - earlier chunk for replay during the final chunk. */ - if (!more) { - *trans_out = trans; - if (!error) - error = trans->err; - } - else if (error && !trans->err) - trans->err = error; - - /* Convert from wire value to local errno, and perform any final read */ - switch (error) { - case NBD_SUCCESS: - if (buf && read_full (h->fd, buf, count)) - return nbdplug_mark_dead (h); - return 0; - case NBD_EPERM: - return EPERM; - case NBD_EIO: - return EIO; - case NBD_ENOMEM: - return ENOMEM; - default: - nbdkit_debug ("unexpected error %d, squashing to EINVAL", error); - /* fallthrough */ - case NBD_EINVAL: - return EINVAL; - case NBD_ENOSPC: - return ENOSPC; - case NBD_EOVERFLOW: - return EOVERFLOW; - case NBD_ESHUTDOWN: - return ESHUTDOWN; - } -} - /* Reader loop. */ void * nbdplug_reader (void *handle) { struct handle *h = handle; - bool done = false; int r; - while (!done) { - struct transaction *trans; - - r = nbdplug_reply_raw (h, &trans); - if (r >= 0) { - if (!trans) - nbdkit_debug ("partial reply handled, waiting for final reply"); - else { - trans->err = r; + while (!nbd_aio_is_dead (h->nbd) && !nbd_aio_is_closed (h->nbd)) { + struct pollfd fds[2] = { + [0].fd = h->fd, + [1].fd = h->fds[0], + [1].events = POLLIN, + }; + struct transaction *trans, **prev; + int dir; + char c; + + dir = nbd_aio_get_direction (h->nbd); + nbdkit_debug ("polling, dir=%d", dir); + if (dir & LIBNBD_AIO_DIRECTION_READ) + fds[0].events |= POLLIN; + if (dir & LIBNBD_AIO_DIRECTION_WRITE) + fds[0].events |= POLLOUT; + if (poll (fds, 2, -1) == -1) { + nbdkit_error ("poll: %m"); + break; + } + + if (dir & LIBNBD_AIO_DIRECTION_READ && fds[0].revents & POLLIN) + nbd_aio_notify_read (h->nbd); + else if (dir & LIBNBD_AIO_DIRECTION_WRITE && fds[0].revents & POLLOUT) + nbd_aio_notify_write (h->nbd); + + /* Check if we were kicked because a command was started */ + if (fds[1].revents & POLLIN && read (h->fds[0], &c, 1) != 1) { + nbdkit_error ("failed to read pipe: %m"); + break; + } + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + trans = h->trans; + prev = &h->trans; + while (trans) { + r = nbd_aio_command_completed (h->nbd, trans->cookie); + if (r == -1) { + nbdkit_debug ("transaction %" PRId64 " failed: %s", trans->cookie, + nbd_get_error ()); + trans->err = nbd_get_errno (); + if (!trans->err) + trans->err = EIO; + } + if (r) { + nbdkit_debug ("cookie %" PRId64 " completed state machine, status %d", + trans->cookie, trans->err); + *prev = trans->next; if (sem_post (&trans->sem)) { nbdkit_error ("failed to post semaphore: %m"); abort (); } } + else + prev = &trans->next; + trans = *prev; } - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - done = h->dead; } /* Clean up any stranded in-flight requests */ - r = ESHUTDOWN; + nbdkit_debug ("state machine changed to %s", nbd_connection_state (h->nbd)); while (1) { struct transaction *trans; @@ -676,15 +276,63 @@ nbdplug_reader (void *handle) } if (!trans) break; - trans->err = r; + r = nbd_aio_command_completed (h->nbd, trans->cookie); + if (r == -1) { + nbdkit_debug ("transaction %" PRId64 " failed: %s", trans->cookie, + nbd_get_error ()); + trans->err = nbd_get_errno (); + if (!trans->err) + trans->err = ESHUTDOWN; + } + else if (!r) + trans->err = ESHUTDOWN; if (sem_post (&trans->sem)) { nbdkit_error ("failed to post semaphore: %m"); abort (); } } + nbdkit_debug ("exiting state machine thread"); return NULL; } +/* Register a cookie and return a transaction. */ +static struct transaction * +nbdplug_register (struct handle *h, int64_t cookie) +{ + struct transaction *trans; + char c = 0; + + if (cookie == -1) { + nbdkit_error ("command failed: %s", nbd_get_error ()); + errno = nbd_get_errno (); + return NULL; + } + + nbdkit_debug ("cookie %" PRId64 " started by state machine", cookie); + trans = calloc (1, sizeof *trans); + if (!trans) { + nbdkit_error ("unable to track transaction: %m"); + return NULL; + } + + /* While locked, kick the reader thread and add our transaction */ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + if (write (h->fds[1], &c, 1) != 1) { + nbdkit_error ("write to pipe: %m"); + free (trans); + return NULL; + } + if (sem_init (&trans->sem, 0, 0)) { + nbdkit_error ("unable to create semaphore: %m"); + free (trans); + return NULL; + } + trans->cookie = cookie; + trans->next = h->trans; + h->trans = trans; + return trans; +} + /* Perform the reply half of a transaction. */ static int nbdplug_reply (struct handle *h, struct transaction *trans) @@ -711,402 +359,60 @@ nbdplug_reply (struct handle *h, struct transaction *trans) return err ? -1 : 0; } -/* Receive response to @option into @reply, and consume any - payload. If @payload is non-NULL, caller must free *payload. Return - 0 on success, or -1 if communication to server is no longer - possible. */ -static int -nbdplug_newstyle_recv_option_reply (struct handle *h, uint32_t option, - struct fixed_new_option_reply *reply, - void **payload) -{ - CLEANUP_FREE char *buffer = NULL; - - if (payload) - *payload = NULL; - if (read_full (h->fd, reply, sizeof *reply)) { - nbdkit_error ("unable to read option reply: %m"); - return -1; - } - reply->magic = be64toh (reply->magic); - reply->option = be32toh (reply->option); - reply->reply = be32toh (reply->reply); - reply->replylen = be32toh (reply->replylen); - if (reply->magic != NBD_REP_MAGIC || reply->option != option) { - nbdkit_error ("unexpected option reply"); - return -1; - } - if (reply->replylen) { - if (reply->reply == NBD_REP_ACK) { - nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32, - reply->replylen); - return -1; - } - if (reply->replylen > 16 * 1024 * 1024) { - nbdkit_error ("option reply length is suspiciously large: %" PRId32, - reply->replylen); - return -1; - } - /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator - makes that string easier to use, without hurting other reply - types where buffer is not a string */ - buffer = malloc (reply->replylen + 1); - if (!buffer) { - nbdkit_error ("malloc: %m"); - return -1; - } - if (read_full (h->fd, buffer, reply->replylen)) { - nbdkit_error ("unable to read option reply payload: %m"); - return -1; - } - buffer[reply->replylen] = '\0'; - if (!payload) - nbdkit_debug ("ignoring option reply payload"); - else { - *payload = buffer; - buffer = NULL; - } - } - return 0; -} - -/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO. - Return 1 if haggling completed, 0 if haggling failed but - NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ -static int -nbdplug_newstyle_haggle (struct handle *h) -{ - const char *const query = "base:allocation"; - struct new_option opt; - uint32_t exportnamelen = htobe32 (strlen (export)); - uint32_t nrqueries = htobe32 (1); - uint32_t querylen = htobe32 (strlen (query)); - /* For now, we make no NBD_INFO_* requests, relying on the server to - send its defaults. TODO: nbdkit should let plugins report block - sizes, at which point we should request NBD_INFO_BLOCK_SIZE and - obey any sizes set by server. */ - uint16_t nrinfos = htobe16 (0); - struct fixed_new_option_reply reply; - - nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY"); - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY); - opt.optlen = htobe32 (0); - if (write_full (h->fd, &opt, sizeof opt)) { - nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); - return -1; - } - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, - NULL) < 0) - return -1; - if (reply.reply == NBD_REP_ACK) { - nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); - h->structured = true; - - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT); - opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + - sizeof nrqueries + sizeof querylen + strlen (query)); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, &exportnamelen, sizeof exportnamelen) || - write_full (h->fd, export, strlen (export)) || - write_full (h->fd, &nrqueries, sizeof nrqueries) || - write_full (h->fd, &querylen, sizeof querylen) || - write_full (h->fd, query, strlen (query))) { - nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); - return -1; - } - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, - NULL) < 0) - return -1; - if (reply.reply == NBD_REP_META_CONTEXT) { - /* Cheat: we asked for exactly one context. We could double - check that the server is replying with exactly the - "base:allocation" context, and then remember the id it tells - us to later confirm that responses to NBD_CMD_BLOCK_STATUS - match up; but in the absence of multiple contexts, it's - easier to just assume the server is compliant, and will reuse - the same id, without bothering to check further. */ - nbdkit_debug ("extents enabled"); - h->extents = true; - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, - &reply, NULL) < 0) - return -1; - } - if (reply.reply != NBD_REP_ACK) { - if (h->extents) { - nbdkit_error ("unexpected response to set meta context"); - return -1; - } - nbdkit_debug ("ignoring meta context response %s", - name_of_nbd_rep (reply.reply)); - } - } - else { - nbdkit_debug ("structured replies disabled"); - } - - /* Try NBD_OPT_GO */ - nbdkit_debug ("trying NBD_OPT_GO"); - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_GO); - opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + - sizeof nrinfos); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, &exportnamelen, sizeof exportnamelen) || - write_full (h->fd, export, strlen (export)) || - write_full (h->fd, &nrinfos, sizeof nrinfos)) { - nbdkit_error ("unable to request NBD_OPT_GO: %m"); - return -1; - } - while (1) { - CLEANUP_FREE void *buffer; - struct fixed_new_option_reply_info_export *reply_export; - uint16_t info; - - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) - return -1; - switch (reply.reply) { - case NBD_REP_INFO: - /* Parse payload, but ignore all except NBD_INFO_EXPORT */ - if (reply.replylen < 2) { - nbdkit_error ("NBD_REP_INFO reply too short"); - return -1; - } - memcpy (&info, buffer, sizeof info); - info = be16toh (info); - switch (info) { - case NBD_INFO_EXPORT: - if (reply.replylen != sizeof *reply_export) { - nbdkit_error ("NBD_INFO_EXPORT reply wrong size"); - return -1; - } - reply_export = buffer; - h->size = be64toh (reply_export->exportsize); - h->flags = be16toh (reply_export->eflags); - break; - default: - nbdkit_debug ("ignoring server info %d", info); - } - break; - case NBD_REP_ACK: - /* End of replies, valid if server already sent NBD_INFO_EXPORT, - observable since h->flags must contain NBD_FLAG_HAS_FLAGS */ - assert (!buffer); - if (!h->flags) { - nbdkit_error ("server omitted NBD_INFO_EXPORT reply to NBD_OPT_GO"); - return -1; - } - nbdkit_debug ("NBD_OPT_GO complete"); - return 1; - case NBD_REP_ERR_UNSUP: - /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */ - nbdkit_debug ("server lacks NBD_OPT_GO support"); - return 0; - default: - /* Unexpected. Either the server sent a legitimate error or an - unexpected reply, but either way, we can't connect. */ - if (NBD_REP_IS_ERR (reply.reply)) - if (reply.replylen) - nbdkit_error ("server rejected NBD_OPT_GO with %s: %s", - name_of_nbd_rep (reply.reply), (char *) buffer); - else - nbdkit_error ("server rejected NBD_OPT_GO with %s", - name_of_nbd_rep (reply.reply)); - else - nbdkit_error ("server used unexpected reply %s to NBD_OPT_GO", - name_of_nbd_rep (reply.reply)); - return -1; - } - } -} - -/* Connect to a Unix socket, returning the fd on success */ -static int -nbdplug_connect_unix (void) -{ - struct sockaddr_un sock = { .sun_family = AF_UNIX }; - int fd; - - nbdkit_debug ("connecting to Unix socket name=%s", sockname); - fd = socket (AF_UNIX, SOCK_STREAM, 0); - if (fd < 0) { - nbdkit_error ("socket: %m"); - return -1; - } - - /* We already validated length during nbdplug_config_complete */ - assert (strlen (sockname) <= sizeof sock.sun_path); - memcpy (sock.sun_path, sockname, strlen (sockname)); - if (connect (fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { - nbdkit_error ("connect: %m"); - return -1; - } - return fd; -} - -/* Connect to a TCP socket, returning the fd on success */ -static int -nbdplug_connect_tcp (void) -{ - struct addrinfo hints = { .ai_family = AF_UNSPEC, - .ai_socktype = SOCK_STREAM, }; - struct addrinfo *result, *rp; - int r; - const int optval = 1; - int fd; - - nbdkit_debug ("connecting to TCP socket host=%s port=%s", hostname, port); - r = getaddrinfo (hostname, port, &hints, &result); - if (r != 0) { - nbdkit_error ("getaddrinfo: %s", gai_strerror (r)); - return -1; - } - - assert (result != NULL); - - for (rp = result; rp; rp = rp->ai_next) { - fd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol); - if (fd == -1) - continue; - if (connect (fd, rp->ai_addr, rp->ai_addrlen) != -1) - break; - close (fd); - } - freeaddrinfo (result); - if (rp == NULL) { - nbdkit_error ("connect: %m"); - close (fd); - return -1; - } - - if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval, - sizeof (int)) == -1) { - nbdkit_error ("cannot set TCP_NODELAY option: %m"); - close (fd); - return -1; - } - return fd; -} - /* Create the shared or per-connection handle. */ static struct handle * nbdplug_open_handle (int readonly) { struct handle *h; - struct old_handshake old; - uint64_t version; + int r; h = calloc (1, sizeof *h); if (h == NULL) { nbdkit_error ("malloc: %m"); return NULL; } + if (pipe (h->fds)) { + nbdkit_error ("pipe: %m"); + free (h); + return NULL; + } retry: + h->fd = -1; + h->nbd = nbd_create (); + if (!h->nbd) + goto err; + if (nbd_set_export_name (h->nbd, export) == -1) + goto err; + if (nbd_add_meta_context (h->nbd, "base:allocation") == -1) + goto err; if (sockname) - h->fd = nbdplug_connect_unix (); + r = nbd_connect_unix (h->nbd, sockname); else - h->fd = nbdplug_connect_tcp (); - if (h->fd == -1) { + r = nbd_connect_tcp (h->nbd, hostname, port); + if (r == -1) { if (retry--) { + nbdkit_debug ("connect failed; will try again: %s", nbd_get_error ()); + nbd_close (h->nbd); sleep (1); goto retry; } goto err; } - - /* old and new handshake share same meaning of first 16 bytes */ - if (read_full (h->fd, &old, offsetof (struct old_handshake, exportsize))) { - nbdkit_error ("unable to read magic: %m"); - goto err; - } - if (strncmp (old.nbdmagic, "NBDMAGIC", sizeof old.nbdmagic)) { - nbdkit_error ("wrong magic, %s is not an NBD server", servname); + h->fd = nbd_aio_get_fd (h->nbd); + if (h->fd == -1) goto err; - } - version = be64toh (old.version); - if (version == OLD_VERSION) { - nbdkit_debug ("trying oldstyle connection"); - if (read_full (h->fd, - (char *) &old + offsetof (struct old_handshake, exportsize), - sizeof old - offsetof (struct old_handshake, exportsize))) { - nbdkit_error ("unable to read old handshake: %m"); - goto err; - } - h->size = be64toh (old.exportsize); - h->flags = be16toh (old.eflags); - } - else if (version == NEW_VERSION) { - uint16_t gflags; - uint32_t cflags; - struct new_option opt; - struct new_handshake_finish finish; - size_t expect; - nbdkit_debug ("trying newstyle connection"); - if (read_full (h->fd, &gflags, sizeof gflags)) { - nbdkit_error ("unable to read global flags: %m"); - goto err; - } - gflags = be16toh (gflags); - cflags = htobe32 (gflags & (NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES)); - if (write_full (h->fd, &cflags, sizeof cflags)) { - nbdkit_error ("unable to return global flags: %m"); - goto err; - } - - /* Prefer NBD_OPT_GO if possible */ - if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { - int rc = nbdplug_newstyle_haggle (h); - if (rc < 0) - goto err; - if (!rc) - goto export_name; - } - else { - export_name: - /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */ - nbdkit_debug ("trying NBD_OPT_EXPORT_NAME"); - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_EXPORT_NAME); - opt.optlen = htobe32 (strlen (export)); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, export, strlen (export))) { - nbdkit_error ("unable to request export '%s': %m", export); - goto err; - } - expect = sizeof finish; - if (gflags & NBD_FLAG_NO_ZEROES) - expect -= sizeof finish.zeroes; - if (read_full (h->fd, &finish, expect)) { - nbdkit_error ("unable to read new handshake: %m"); - goto err; - } - h->size = be64toh (finish.exportsize); - h->flags = be16toh (finish.eflags); - } - } - else { - nbdkit_error ("unexpected version %#" PRIx64, version); - goto err; - } if (readonly) - h->flags |= NBD_FLAG_READ_ONLY; + h->readonly = true; /* Spawn a dedicated reader thread */ - if ((errno = pthread_mutex_init (&h->write_lock, NULL))) { - nbdkit_error ("failed to initialize write mutex: %m"); - goto err; - } if ((errno = pthread_mutex_init (&h->trans_lock, NULL))) { nbdkit_error ("failed to initialize transaction mutex: %m"); - pthread_mutex_destroy (&h->write_lock); goto err; } if ((errno = pthread_create (&h->reader, NULL, nbdplug_reader, h))) { nbdkit_error ("failed to initialize reader thread: %m"); - pthread_mutex_destroy (&h->write_lock); pthread_mutex_destroy (&h->trans_lock); goto err; } @@ -1114,8 +420,11 @@ nbdplug_open_handle (int readonly) return h; err: - if (h->fd >= 0) - close (h->fd); + close (h->fds[0]); + close (h->fds[1]); + nbdkit_error ("failure while creating nbd handle: %s", nbd_get_error ()); + if (h->nbd) + nbd_close (h->nbd); free (h); return NULL; } @@ -1133,14 +442,13 @@ nbdplug_open (int readonly) static void nbdplug_close_handle (struct handle *h) { - if (!h->dead) { - nbdplug_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); - shutdown (h->fd, SHUT_WR); - } + if (nbd_shutdown (h->nbd) == -1) + nbdkit_debug ("failed to clean up handle: %s", nbd_get_error ()); if ((errno = pthread_join (h->reader, NULL))) nbdkit_debug ("failed to join reader thread: %m"); - close (h->fd); - pthread_mutex_destroy (&h->write_lock); + close (h->fds[0]); + close (h->fds[1]); + nbd_close (h->nbd); pthread_mutex_destroy (&h->trans_lock); free (h); } @@ -1155,87 +463,137 @@ nbdplug_close (void *handle) nbdplug_close_handle (h); } + + /* Get the file size. */ static int64_t nbdplug_get_size (void *handle) { struct handle *h = handle; + int64_t size = nbd_get_size (h->nbd); - return h->size; + if (size == -1) { + nbdkit_error ("failure to get size: %s", nbd_get_error ()); + return -1; + } + return size; } static int nbdplug_can_write (void *handle) { struct handle *h = handle; + int i = nbd_read_only (h->nbd); - return !(h->flags & NBD_FLAG_READ_ONLY); + if (i == -1) { + nbdkit_error ("failure to check readonly flag: %s", nbd_get_error ()); + return -1; + } + return !(i || h->readonly); } static int nbdplug_can_flush (void *handle) { struct handle *h = handle; + int i = nbd_can_flush (h->nbd); - return h->flags & NBD_FLAG_SEND_FLUSH; + if (i == -1) { + nbdkit_error ("failure to check flush flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_is_rotational (void *handle) { struct handle *h = handle; + int i = nbd_is_rotational (h->nbd); - return h->flags & NBD_FLAG_ROTATIONAL; + if (i == -1) { + nbdkit_error ("failure to check rotational flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_trim (void *handle) { struct handle *h = handle; + int i = nbd_can_trim (h->nbd); - return h->flags & NBD_FLAG_SEND_TRIM; + if (i == -1) { + nbdkit_error ("failure to check trim flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_zero (void *handle) { struct handle *h = handle; + int i = nbd_can_zero (h->nbd); - return h->flags & NBD_FLAG_SEND_WRITE_ZEROES; + if (i == -1) { + nbdkit_error ("failure to check zero flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_fua (void *handle) { struct handle *h = handle; + int i = nbd_can_fua (h->nbd); - return h->flags & NBD_FLAG_SEND_FUA ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; + if (i == -1) { + nbdkit_error ("failure to check fua flag: %s", nbd_get_error ()); + return -1; + } + return i ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; } static int nbdplug_can_multi_conn (void *handle) { struct handle *h = handle; + int i = nbd_can_multi_conn (h->nbd); - return h->flags & NBD_FLAG_CAN_MULTI_CONN; + if (i == -1) { + nbdkit_error ("failure to check multi-conn flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_cache (void *handle) { struct handle *h = handle; + int i = nbd_can_cache (h->nbd); - if (h->flags & NBD_FLAG_SEND_CACHE) - return NBDKIT_CACHE_NATIVE; - return NBDKIT_CACHE_NONE; + if (i == -1) { + nbdkit_error ("failure to check cache flag: %s", nbd_get_error ()); + return -1; + } + return i ? NBDKIT_CACHE_NATIVE : NBDKIT_CACHE_NONE; } static int nbdplug_can_extents (void *handle) { struct handle *h = handle; + int i = nbd_can_meta_context (h->nbd, "base:allocation"); - return h->extents; + if (i == -1) { + nbdkit_error ("failure to check extents ability: %s", nbd_get_error ()); + return -1; + } + return i; } /* Read data from the file. */ @@ -1247,7 +605,7 @@ nbdplug_pread (void *handle, void *buf, uint32_t count, uint64_t offset, struct transaction *s; assert (!flags); - s = nbdplug_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); + s = nbdplug_register (h, nbd_aio_pread (h->nbd, buf, count, offset, 0)); return nbdplug_reply (h, s); } @@ -1258,10 +616,10 @@ nbdplug_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, { struct handle *h = handle; struct transaction *s; + uint32_t f = flags & NBDKIT_FLAG_FUA ? LIBNBD_CMD_FLAG_FUA : 0; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbdplug_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_WRITE, offset, count, buf, NULL, NULL); + s = nbdplug_register (h, nbd_aio_pwrite (h->nbd, buf, count, offset, f)); return nbdplug_reply (h, s); } @@ -1274,13 +632,12 @@ nbdplug_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) uint32_t f = 0; assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); - assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); if (!(flags & NBDKIT_FLAG_MAY_TRIM)) - f |= NBD_CMD_FLAG_NO_HOLE; + f |= LIBNBD_CMD_FLAG_NO_HOLE; if (flags & NBDKIT_FLAG_FUA) - f |= NBD_CMD_FLAG_FUA; - s = nbdplug_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); + f |= LIBNBD_CMD_FLAG_FUA; + s = nbdplug_register (h, nbd_aio_zero (h->nbd, count, offset, f)); return nbdplug_reply (h, s); } @@ -1290,10 +647,10 @@ nbdplug_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; + uint32_t f = flags & NBDKIT_FLAG_FUA ? LIBNBD_CMD_FLAG_FUA : 0; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbdplug_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_TRIM, offset, count); + s = nbdplug_register (h, nbd_aio_trim (h->nbd, count, offset, f)); return nbdplug_reply (h, s); } @@ -1305,10 +662,29 @@ nbdplug_flush (void *handle, uint32_t flags) struct transaction *s; assert (!flags); - s = nbdplug_request (h, 0, NBD_CMD_FLUSH, 0, 0); + s = nbdplug_register (h, nbd_aio_flush (h->nbd, 0)); return nbdplug_reply (h, s); } +static int +nbdplug_extent (void *opaque, const char *metacontext, uint64_t offset, + uint32_t *entries, size_t nr_entries) +{ + struct nbdkit_extents *extents = opaque; + + assert (strcmp (metacontext, "base:allocation") == 0); + assert (nr_entries % 2 == 0); + while (nr_entries) { + /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ + if (nbdkit_add_extent (extents, offset, entries[0], entries[1]) == -1) + return -1; + offset += entries[0]; + entries += 2; + nr_entries -= 2; + } + return 0; +} + /* Read extents of the file. */ static int nbdplug_extents (void *handle, uint32_t count, uint64_t offset, @@ -1316,11 +692,11 @@ nbdplug_extents (void *handle, uint32_t count, uint64_t offset, { struct handle *h = handle; struct transaction *s; - uint32_t f = flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0; + uint32_t f = flags & NBDKIT_FLAG_REQ_ONE ? LIBNBD_CMD_FLAG_REQ_ONE : 0; - assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); - s = nbdplug_request_full (h, f, NBD_CMD_BLOCK_STATUS, offset, count, NULL, - NULL, extents); + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + s = nbdplug_register (h, nbd_aio_block_status (h->nbd, count, offset, + extents, nbdplug_extent, f)); return nbdplug_reply (h, s); } @@ -1332,7 +708,7 @@ nbdplug_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) struct transaction *s; assert (!flags); - s = nbdplug_request (h, 0, NBD_CMD_CACHE, offset, count); + s = nbdplug_register (h, nbd_aio_cache (h->nbd, count, offset, 0)); return nbdplug_reply (h, s); } diff --git a/plugins/nbd/Makefile.am b/plugins/nbd/Makefile.am index bfc2a838..a854a448 100644 --- a/plugins/nbd/Makefile.am +++ b/plugins/nbd/Makefile.am @@ -41,7 +41,6 @@ nbdkit_nbd_plugin_la_SOURCES = \ nbdkit_nbd_plugin_la_CPPFLAGS = \ -I$(top_srcdir)/include \ -I$(top_srcdir)/common/include \ - -I$(top_srcdir)/common/protocol \ -I$(top_srcdir)/common/utils \ -I$(top_srcdir)/server nbdkit_nbd_plugin_la_CFLAGS = \ @@ -50,7 +49,6 @@ nbdkit_nbd_plugin_la_LDFLAGS = \ -module -avoid-version -shared \ -Wl,--version-script=$(top_srcdir)/plugins/plugins.syms nbdkit_nbd_plugin_la_LIBADD = \ - $(top_builddir)/common/protocol/libprotocol.la \ $(top_builddir)/common/utils/libutils.la # TODO: drop standalone version, which is locked at nbdkit 1.13.4 behavior, @@ -62,9 +60,15 @@ nbdkit_nbd_plugin_la_CFLAGS += \ $(LIBNBD_CFLAGS) nbdkit_nbd_plugin_la_LIBADD += \ $(LIBNBD_LIBS) + else !HAVE_LIBNBD nbdkit_nbd_plugin_la_SOURCES += \ nbd-standalone.c +nbdkit_nbd_plugin_la_CPPFLAGS += \ + -I$(top_srcdir)/common/protocol \ +nbdkit_nbd_plugin_la_LIBADD += \ + $(top_builddir)/common/protocol/libprotocol.la + endif !HAVE_LIBNBD if HAVE_POD -- 2.20.1
Eric Blake
2019-Jun-12 21:00 UTC
[Libguestfs] [nbdkit PATCH v3 4/5] nbd: Add magic uri=... parameter
Allowing 'nbdkit nbd nbd://localhost:10810/foo' is nicer than the existing 'nbdkit nbd hostname=localhost port=10810 export=foo'. Also, it is useful to let the user learn if nbdkit was built against libnbd and in turn if libnbd used libxml2 for URI support. Thankfully, the magic parameter support in nbdkit recognizes nbd:///?socket=/path/to/sock as a parameter needing magic treatment; however, this form of URI includes a shell glob character, so it is still wise to quote the ? to prevent unintended results for (very unusual) file system contents reached from the local directory. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbdkit-nbd-plugin.pod | 46 +++++++++++++++++++----- plugins/nbd/nbd.c | 59 +++++++++++++++++++++++++++---- 2 files changed, 90 insertions(+), 15 deletions(-) diff --git a/plugins/nbd/nbdkit-nbd-plugin.pod b/plugins/nbd/nbdkit-nbd-plugin.pod index 7baff981..98f45a35 100644 --- a/plugins/nbd/nbdkit-nbd-plugin.pod +++ b/plugins/nbd/nbdkit-nbd-plugin.pod @@ -4,8 +4,8 @@ nbdkit-nbd-plugin - nbdkit nbd plugin =head1 SYNOPSIS - nbdkit nbd { socket=SOCKNAME | hostname=HOST [port=PORT] } [export=NAME] - [retry=N] [shared=BOOL] + nbdkit nbd { socket=SOCKNAME | hostname=HOST [port=PORT] | [uri=]URI } + [export=NAME] [retry=N] [shared=BOOL] =head1 DESCRIPTION @@ -27,6 +27,14 @@ is feasible that future additions will support encryption. =head1 PARAMETERS +One of B<socket>, B<hostname> or B<uri> must be provided to designate +the server. The server can speak either new or old style +protocol. C<uri=> is a magic config key and may be omitted in most +cases. See L<nbdkit(1)/Magic parameters>. + +The following parameters are available whether or not the plugin was +compiled against libnbd: + =over 4 =item B<socket=>SOCKNAME @@ -42,14 +50,12 @@ Connect to the NBD server at the given remote C<HOST> using a TCP socket. When B<hostname> is supplied, use B<PORT> instead of the default port 10809. -Either B<socket> or B<hostname> must be provided. The server can speak -either new or old style protocol. - =item B<export=>NAME If this parameter is given, and the server speaks new style protocol, then connect to the named export instead of the default export (the -empty string). +empty string). If C<uri> is supplied, the export name should be +embedded in the URI instead. =item B<retry=>N @@ -69,6 +75,24 @@ nbdkit will share that single connection. =back +The following parameters are only available if the plugin was compiled +against libnbd: + +=over 4 + +=item B<uri=>URI + +When B<uri> is supplied, decode B<URI> to determine the address to +connect to. A URI can specify a TCP connection (such as +C<nbd://localhost:10809/export>) or a Unix socket (such as +C<nbd+unix:///export?socket=/path/to/sock>). Remember to use proper +shell quoting to prevent B<URI> from accidentally being handled as a +shell glob. The B<uri> parameter is only available when the plugin was +compiled against libnbd with URI support; C<nbdkit --dump-plugin nbd> +will contain C<libnbd_uri=1> if this is the case. + +=back + =head1 EXAMPLES Expose the contents of an export served by an old style server over a @@ -92,8 +116,8 @@ C<qemu-nbd> without B<-t> normally quits after the first client, and utilize a 5-second retry to give qemu-nbd time to create the socket: ( sock=`mktemp -u` - nbdkit --exit-with-parent --filter=partition nbd socket=$sock \ - shared=1 retry=5 partition=1 & + nbdkit --exit-with-parent --filter=partition nbd \ + nbd+unix:///\?socket=$sock shared=1 retry=5 partition=1 & exec qemu-nbd -k $sock -f qcow2 /path/to/image.qcow2 ) Conversely, expose the contents of export I<foo> from a new style @@ -108,6 +132,11 @@ client exits. │ old client │ ────────▶│ nbdkit │ ────────▶│ new server │ └────────────┘ Unix └────────┘ TCP └────────────┘ +Learn which features are provided by libnbd by inspecting the +C<libnbd_*> lines: + + nbdkit --dump-plugin nbd + =head1 SEE ALSO L<nbdkit(1)>, @@ -115,6 +144,7 @@ L<nbdkit-captive(1)>, L<nbdkit-filter(3)>, L<nbdkit-tls(1)>, L<nbdkit-plugin(3)>, +L<libnbd(3)>, L<qemu-nbd(1)>. =head1 AUTHORS diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 5d79c981..0f54805c 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -87,6 +87,9 @@ static char *sockname; static const char *hostname; static const char *port; +/* Connect to server via URI */ +static const char *uri; + /* Name of export on remote server, default '', ignored for oldstyle */ static const char *export; @@ -109,9 +112,9 @@ nbdplug_unload (void) } /* Called for each key=value passed on the command line. This plugin - * accepts socket=<sockname> or hostname=<hostname>/port=<port> - * (exactly one connection required), and optional parameters - * export=<name>, retry=<n> and shared=<bool>. + * accepts socket=<sockname>, hostname=<hostname>/port=<port>, or + * [uri=]<uri> (exactly one connection required), and optional + * parameters export=<name>, retry=<n> and shared=<bool>. */ static int nbdplug_config (const char *key, const char *value) @@ -130,6 +133,8 @@ nbdplug_config (const char *key, const char *value) hostname = value; else if (strcmp (key, "port") == 0) port = value; + else if (strcmp (key, "uri") == 0) + uri = value; else if (strcmp (key, "export") == 0) export = value; else if (strcmp (key, "retry") == 0) { @@ -165,19 +170,40 @@ nbdplug_config_complete (void) nbdkit_error ("cannot mix Unix socket and TCP hostname/port parameters"); return -1; } + else if (uri) { + nbdkit_error ("cannot mix Unix socket and URI parameters"); + return -1; + } if (strlen (sockname) > sizeof sock.sun_path) { nbdkit_error ("socket file name too large"); return -1; } } - else { - if (!hostname) { - nbdkit_error ("must supply socket= or hostname= of external NBD server"); + else if (hostname) { + if (uri) { + nbdkit_error ("cannot mix TCP hostname/port and URI parameters"); return -1; } if (!port) port = "10809"; } + else if (uri) { + struct nbd_handle *nbd = nbd_create (); + + if (!nbd) { + nbdkit_error ("unable to query libnbd details: %s", nbd_get_error ()); + return -1; + } + if (!nbd_supports_uri (nbd)) { + nbdkit_error ("libnbd was compiled without uri support"); + nbd_close (nbd); + return -1; + } + nbd_close (nbd); + } else { + nbdkit_error ("must supply socket=, hostname= or uri= of external NBD server"); + return -1; + } if (!export) export = ""; @@ -188,6 +214,7 @@ nbdplug_config_complete (void) } #define nbdplug_config_help \ + "[uri=]<URI> URI of an NBD socket to connect to (if supported).\n" \ "socket=<SOCKNAME> The Unix socket to connect to.\n" \ "hostname=<HOST> The hostname for the TCP socket to connect to.\n" \ "port=<PORT> TCP port or service name to use (default 10809).\n" \ @@ -196,6 +223,20 @@ nbdplug_config_complete (void) "shared=<BOOL> True to share one server connection among all clients,\n" \ " rather than a connection per client (default false).\n" \ +static void +nbdplug_dump_plugin (void) +{ + struct nbd_handle *nbd = nbd_create (); + + if (!nbd) { + nbdkit_error ("unable to query libnbd details: %s", nbd_get_error ()); + exit (EXIT_FAILURE); + } + printf ("libnbd_version=%s\n", nbd_get_version (nbd)); + printf ("libnbd_uri=%d\n", nbd_supports_uri (nbd)); + nbd_close (nbd); +} + #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL /* Reader loop. */ @@ -386,7 +427,9 @@ nbdplug_open_handle (int readonly) goto err; if (nbd_add_meta_context (h->nbd, "base:allocation") == -1) goto err; - if (sockname) + if (uri) + r = nbd_connect_uri (h->nbd, uri); + else if (sockname) r = nbd_connect_unix (h->nbd, sockname); else r = nbd_connect_tcp (h->nbd, hostname, port); @@ -720,6 +763,8 @@ static struct nbdkit_plugin plugin = { .config = nbdplug_config, .config_complete = nbdplug_config_complete, .config_help = nbdplug_config_help, + .magic_config_key = "uri", + .dump_plugin = nbdplug_dump_plugin, .open = nbdplug_open, .close = nbdplug_close, .get_size = nbdplug_get_size, -- 2.20.1
Eric Blake
2019-Jun-12 21:00 UTC
[Libguestfs] [nbdkit PATCH v3 5/5] nbd: Add TLS client support
Well, really add parameters to pass on to libnbd which does all the heavy lifting :) This also adds testsuite coverage that we support TLS over Unix sockets. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbdkit-nbd-plugin.pod | 68 +++++++++++++++++++------ plugins/nbd/nbd.c | 77 ++++++++++++++++++++++++++++- TODO | 13 +---- tests/Makefile.am | 6 ++- tests/test-nbd-tls-psk.sh | 81 ++++++++++++++++++++++++++++++ tests/test-nbd-tls.sh | 82 +++++++++++++++++++++++++++++++ tests/test-tls-psk.sh | 2 +- tests/test-tls.sh | 4 +- 8 files changed, 302 insertions(+), 31 deletions(-) create mode 100755 tests/test-nbd-tls-psk.sh create mode 100755 tests/test-nbd-tls.sh diff --git a/plugins/nbd/nbdkit-nbd-plugin.pod b/plugins/nbd/nbdkit-nbd-plugin.pod index 98f45a35..f4b8093e 100644 --- a/plugins/nbd/nbdkit-nbd-plugin.pod +++ b/plugins/nbd/nbdkit-nbd-plugin.pod @@ -5,7 +5,8 @@ nbdkit-nbd-plugin - nbdkit nbd plugin =head1 SYNOPSIS nbdkit nbd { socket=SOCKNAME | hostname=HOST [port=PORT] | [uri=]URI } - [export=NAME] [retry=N] [shared=BOOL] + [export=NAME] [retry=N] [shared=BOOL] [tls=MODE] [tls-certificates=DIR] + [tls-verify=BOOL] [tls-username=NAME] [tls-psk=FILE] =head1 DESCRIPTION @@ -20,10 +21,9 @@ original server lacks it). Use of this plugin along with nbdkit filters (adding I<--filter> to the nbdkit command line) makes it possible to apply any nbdkit filter to any other NBD server. -For now, this is limited to connecting to another NBD server over an -unencrypted connection; if the data is sensitive, it is better to -stick to a Unix socket rather than transmitting plaintext over TCP. It -is feasible that future additions will support encryption. +Remember that when using this plugin as a bridge between an encrypted +and a non-encrypted endpoint, it is best to preserve encryption over +TCP and use plaintext only on a Unix socket. =head1 PARAMETERS @@ -91,6 +91,45 @@ shell glob. The B<uri> parameter is only available when the plugin was compiled against libnbd with URI support; C<nbdkit --dump-plugin nbd> will contain C<libnbd_uri=1> if this is the case. +=item B<tls=>MODE + +Selects which TLS mode to use with the server. If no other tls option +is present, this defaults to C<off>, where the client does not attempt +encryption (and may be rejected by a server that requires it). If +omitted but another tls option is present, this defaults to C<on>, +where the client opportunistically attempts a TLS handshake, but will +continue running unencrypted if the server does not support +encryption. If set to C<require>, this requires an encrypted +connection to the server. + +The B<tls> parameter is only available when the plugin was compiled +against libnbd with TLS support; C<nbdkit --dump-plugin nbd> will +contain C<libnbd_tls=1> if this is the case. Note the difference +between C<--tls=...> controlling what nbdkit serves, and C<tls=...> +controlling what the nbd plugin connects to as a client. + +=item B<tls-certificates=>DIR + +This specifies the directory containing X.509 client certificates to +present to the server. + +=item B<tls-verify=>BOOL + +This defaults to true; setting it to false disables server name +verification, which opens you to potential Man-in-the-Middle (MITM) +attacks, but allows for a simpler setup for distributing certificates. + +=item B<tls-username=>NAME + +If provided, this overrides the user name to present to the server +alongside the certificate. + +=item B<tls-psk=>FILE + +If provided, this is the filename containing the Pre-Shared Keys (PSK) +to present to the server. While this is easier to set up than X.509, +it requires that the PSK file be transmitted over a secure channel. + =back =head1 EXAMPLES @@ -104,9 +143,9 @@ that the old server exits. nbdkit --exit-with-parent --tls=require nbd socket=$sock & exec /path/to/oldserver --socket=$sock ) - ┌────────────┐ ┌────────┐ ┌────────────┐ - │ new client │ ────────▶│ nbdkit │ ────────▶│ old server │ - └────────────┘ TCP └────────┘ Unix └────────────┘ + ┌────────────┐ TLS ┌────────┐ plaintext ┌────────────┐ + │ new client │ ────────▶│ nbdkit │ ───────────▶│ old server │ + └────────────┘ TCP └────────┘ Unix └────────────┘ Combine nbdkit's partition filter with qemu-nbd's ability to visit qcow2 files (nbdkit does not have a native qcow2 plugin), performing @@ -121,16 +160,17 @@ utilize a 5-second retry to give qemu-nbd time to create the socket: exec qemu-nbd -k $sock -f qcow2 /path/to/image.qcow2 ) Conversely, expose the contents of export I<foo> from a new style -server with unencrypted data to a client that can only consume +server with encrypted data to a client that can only consume unencrypted old style. Use I<--run> to clean up nbdkit at the time the -client exits. +client exits. In general, note that it is best to keep the plaintext +connection limited to a Unix socket on the local machine. - nbdkit -U - -o nbd hostname=example.com export=foo \ + nbdkit -U - -o --tls=off nbd hostname=example.com export=foo tls=require \ --run '/path/to/oldclient --socket=$unixsocket' - ┌────────────┐ ┌────────┐ ┌────────────┐ - │ old client │ ────────▶│ nbdkit │ ────────▶│ new server │ - └────────────┘ Unix └────────┘ TCP └────────────┘ + ┌────────────┐ plaintext ┌────────┐ TLS ┌────────────┐ + │ old client │ ───────────▶│ nbdkit │ ────────▶│ new server │ + └────────────┘ Unix └────────┘ TCP └────────────┘ Learn which features are provided by libnbd by inspecting the C<libnbd_*> lines: diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 0f54805c..fa14ea1b 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -100,6 +100,13 @@ static unsigned long retry; static bool shared; static struct handle *shared_handle; +/* Control TLS settings */ +static int tls = -1; +static char *tls_certificates; +static int tls_verify = -1; +static const char *tls_username; +static char *tls_psk; + static struct handle *nbdplug_open_handle (int readonly); static void nbdplug_close_handle (struct handle *h); @@ -109,12 +116,15 @@ nbdplug_unload (void) if (shared) nbdplug_close_handle (shared_handle); free (sockname); + free (tls_certificates); + free (tls_psk); } /* Called for each key=value passed on the command line. This plugin * accepts socket=<sockname>, hostname=<hostname>/port=<port>, or * [uri=]<uri> (exactly one connection required), and optional - * parameters export=<name>, retry=<n> and shared=<bool>. + * parameters export=<name>, retry=<n>, shared=<bool> and various + * tls settings. */ static int nbdplug_config (const char *key, const char *value) @@ -151,6 +161,37 @@ nbdplug_config (const char *key, const char *value) return -1; shared = r; } + else if (strcmp (key, "tls") == 0) { + if (strcasecmp (value, "require") == 0 || + strcasecmp (value, "required") == 0 || + strcasecmp (value, "force") == 0) + tls = 2; + else { + tls = nbdkit_parse_bool (value); + if (tls == -1) + exit (EXIT_FAILURE); + } + } + else if (strcmp (key, "tls-certificates") == 0) { + free (tls_certificates); + tls_certificates = nbdkit_absolute_path (value); + if (!tls_certificates) + return -1; + } + else if (strcmp (key, "tls-verify") == 0) { + r = nbdkit_parse_bool (value); + if (r == -1) + return -1; + tls_verify = r; + } + else if (strcmp (key, "tls-username") == 0) + tls_username = value; + else if (strcmp (key, "tls-psk") == 0) { + free (tls_psk); + tls_psk = nbdkit_absolute_path (value); + if (!tls_psk) + return -1; + } else { nbdkit_error ("unknown parameter '%s'", key); return -1; @@ -208,6 +249,23 @@ nbdplug_config_complete (void) if (!export) export = ""; + if (tls == -1) + tls = tls_certificates || tls_verify >= 0 || tls_username || tls_psk; + if (tls > 0) { + struct nbd_handle *nbd = nbd_create (); + + if (!nbd) { + nbdkit_error ("unable to query libnbd details: %s", nbd_get_error ()); + return -1; + } + if (!nbd_supports_tls (nbd)) { + nbdkit_error ("libnbd was compiled without tls support"); + nbd_close (nbd); + return -1; + } + nbd_close (nbd); + } + if (shared && (shared_handle = nbdplug_open_handle (false)) == NULL) return -1; return 0; @@ -222,6 +280,11 @@ nbdplug_config_complete (void) "retry=<N> Retry connection up to N seconds (default 0).\n" \ "shared=<BOOL> True to share one server connection among all clients,\n" \ " rather than a connection per client (default false).\n" \ + "tls=<MODE> How to use TLS; one of 'off', 'on', or 'require'.\n" \ + "tls-certificates=<DIR> Directory containing files for X.509 certificates.\n" \ + "tls-verify=<BOOL> True (default for X.509) to validate server.\n" \ + "tls-username=<NAME> Override username presented in X.509 TLS.\n" \ + "tls-psk=<FILE> File containing Pre-Shared Key for TLS.\n" \ static void nbdplug_dump_plugin (void) @@ -233,6 +296,7 @@ nbdplug_dump_plugin (void) exit (EXIT_FAILURE); } printf ("libnbd_version=%s\n", nbd_get_version (nbd)); + printf ("libnbd_tls=%d\n", nbd_supports_tls (nbd)); printf ("libnbd_uri=%d\n", nbd_supports_uri (nbd)); nbd_close (nbd); } @@ -427,6 +491,17 @@ nbdplug_open_handle (int readonly) goto err; if (nbd_add_meta_context (h->nbd, "base:allocation") == -1) goto err; + if (nbd_set_tls (h->nbd, tls) == -1) + goto err; + if (tls_certificates && + nbd_set_tls_certificates (h->nbd, tls_certificates) == -1) + goto err; + if (tls_verify >= 0 && nbd_set_tls_verify_peer (h->nbd, tls_verify) == -1) + goto err; + if (tls_username && nbd_set_tls_username (h->nbd, tls_username) == -1) + goto err; + if (tls_psk && nbd_set_tls_psk_file (h->nbd, tls_psk) == -1) + goto err; if (uri) r = nbd_connect_uri (h->nbd, uri); else if (sockname) diff --git a/TODO b/TODO index b9ddb1ee..332400b3 100644 --- a/TODO +++ b/TODO @@ -90,13 +90,6 @@ qemu-nbd for these use cases. https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02971.html is a partial solution but it needs cleaning up. -nbdkit-nbd-plugin could use enhancements: - -* Enable client-side TLS (right now, the nbd plugin allows us to - support an encrypted client connecting to a plain server; but we - would need TLS to support a plain client connecting to an encrypted - server). - nbdkit-floppy-plugin: * Add boot sector support. In theory this is easy (eg. using @@ -159,11 +152,7 @@ Filters allow certain types of composition, but others would not be possible, for example RAIDing over multiple nbd sources. Because the plugin API limits us to loading a single plugin to the server, the best way to do this (and the most robust) is to compose multiple -nbdkit processes. - -The nbd plugin (plugins/nbd) already contains an NBD client, so we -could factor this client out and make it available to other plugins to -use. +nbdkit processes. Perhaps libnbd will prove useful for this purpose. Build-related ------------- diff --git a/tests/Makefile.am b/tests/Makefile.am index aaf74502..26ca2a12 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -81,6 +81,8 @@ EXTRA_DIST = \ test-memory-largest.sh \ test-memory-largest-for-qemu.sh \ test-nbd-extents.sh \ + test-nbd-tls.sh \ + test-nbd-tls-psk.sh \ test-nozero.sh \ test-null-extents.sh \ test_ocaml_plugin.ml \ @@ -534,7 +536,9 @@ TESTS += \ # nbd plugin test. LIBGUESTFS_TESTS += test-nbd TESTS += \ - test-nbd-extents.sh + test-nbd-extents.sh \ + test-nbd-tls.sh \ + test-nbd-tls-psk.sh test_nbd_SOURCES = test-nbd.c test.h test_nbd_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) diff --git a/tests/test-nbd-tls-psk.sh b/tests/test-nbd-tls-psk.sh new file mode 100755 index 00000000..d0bbc468 --- /dev/null +++ b/tests/test-nbd-tls-psk.sh @@ -0,0 +1,81 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +source ./functions.sh +set -e +set -x + +requires qemu-img --version + +# Does the nbdkit binary support TLS? +if ! nbdkit --dump-config | grep -sq tls=yes; then + echo "$0: nbdkit built without TLS support" + exit 77 +fi + +# Does the nbd plugin support TLS? +if ! nbdkit --dump-plugin nbd | grep -sq libnbd_tls=1; then + echo "$0: nbd plugin built without TLS support" + exit 77 +fi + +# Did we create the PSK keys file? +# Probably 'certtool' is missing. +if [ ! -s keys.psk ]; then + echo "$0: PSK keys file was not created by the test harness" + exit 77 +fi + +sock1=`mktemp -u` +sock2=`mktemp -u` +pid1="test-nbd-tls-psk.pid1" +pid2="test-nbd-tls-psk.pid2" + +files="$sock1 $sock2 $pid1 $pid2 nbd-tls-psk.out" +rm -f $files +cleanup_fn rm -f $files + +# Run encrypted server +start_nbdkit -P "$pid1" -U "$sock1" \ + --tls=require --tls-psk=keys.psk example1 + +# Run nbd plugin as intermediary +LIBNBD_DEBUG=1 start_nbdkit -P "$pid2" -U "$sock2" --tls=off \ + nbd tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1" + +# Run unencrypted client +LANG=C qemu-img info -f raw "nbd+unix:///?socket=$sock2" > nbd-tls-psk.out + +cat nbd-tls-psk.out + +grep -sq "^file format: raw" nbd-tls-psk.out +grep -sq "^virtual size: 100M" nbd-tls-psk.out diff --git a/tests/test-nbd-tls.sh b/tests/test-nbd-tls.sh new file mode 100755 index 00000000..af824d23 --- /dev/null +++ b/tests/test-nbd-tls.sh @@ -0,0 +1,82 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +source ./functions.sh +set -e +set -x + +requires qemu-img --version + +# Does the nbdkit binary support TLS? +if ! nbdkit --dump-config | grep -sq tls=yes; then + echo "$0: nbdkit built without TLS support" + exit 77 +fi + +# Does the nbd plugin support TLS? +if ! nbdkit --dump-plugin nbd | grep -sq libnbd_tls=1; then + echo "$0: nbd plugin built without TLS support" + exit 77 +fi + +# Did we create the PKI files? +# Probably 'certtool' is missing. +pkidir="$PWD/pki" +if [ ! -f "$pkidir/ca-cert.pem" ]; then + echo "$0: PKI files were not created by the test harness" + exit 77 +fi + +sock1=`mktemp -u` +sock2=`mktemp -u` +pid1="test-nbd-tls.pid1" +pid2="test-nbd-tls.pid2" + +files="$sock1 $sock2 $pid1 $pid2 nbd-tls.out" +rm -f $files +cleanup_fn rm -f $files + +# Run encrypted server +start_nbdkit -P "$pid1" -U "$sock1" \ + --tls=require --tls-certificates="$pkidir" example1 + +# Run nbd plugin as intermediary +LIBNBD_DEBUG=1 start_nbdkit -P "$pid2" -U "$sock2" --tls=off \ + nbd tls=require tls-certificates="$pkidir" socket="$sock1" + +# Run unencrypted client +LANG=C qemu-img info -f raw "nbd+unix:///?socket=$sock2" > nbd-tls.out + +cat nbd-tls.out + +grep -sq "^file format: raw" nbd-tls.out +grep -sq "^virtual size: 100M" nbd-tls.out diff --git a/tests/test-tls-psk.sh b/tests/test-tls-psk.sh index c5f4f974..393f5893 100755 --- a/tests/test-tls-psk.sh +++ b/tests/test-tls-psk.sh @@ -63,7 +63,7 @@ if [ ! -s keys.psk ]; then fi # Unfortunately qemu cannot do TLS over a Unix domain socket (nbdkit -# probably can, although it is not tested). Find an unused port to +# can, but that is tested in tests-nbd-tls-psk.sh). Find an unused port to # listen on. pick_unused_port diff --git a/tests/test-tls.sh b/tests/test-tls.sh index 4c3d010e..70d40aea 100755 --- a/tests/test-tls.sh +++ b/tests/test-tls.sh @@ -55,8 +55,8 @@ if [ ! -f "$pkidir/ca-cert.pem" ]; then exit 77 fi -# Unfortunately qemu cannot do TLS over a Unix domain socket (nbdkit -# probably can, although it is not tested). Find an unused port to +# Unfortunately qemu 4.0 cannot do TLS over a Unix domain socket (nbdkit +# can, but that is tested in tests-nbd-tls.sh). Find an unused port to # listen on. pick_unused_port -- 2.20.1
Richard W.M. Jones
2019-Jun-12 21:18 UTC
Re: [Libguestfs] [nbdkit PATCH v3 0/5] Play with libnbd for nbdkit-nbd
On Wed, Jun 12, 2019 at 04:00:08PM -0500, Eric Blake wrote:> libnbd-0.1.4-1 is now available in Fedora 29/30 updates testing. > > Diffs since v2 - rebase to master, bump from libnbd 0.1.2 to 0.1.3+, > add tests to TLS usage which flushed out the need to turn relative > pathnames into absolute, doc tweaks > > Now that the testsuite covers TLS and libnbd has been fixed to provide > the things I found lacking when developing v2, I'm leaning towards > pushing this on Monday, or sooner if I get a positive review.This looks reasonable. There's a slow-down, but we may be able to fix that with a concurrent writer thread in future. The pipe-to-self trick is interesting and comes back to the issue of what is the thread model of libnbd. If we think that libnbd shouldn't create threads or do tricky Linux-specific stuff internally, then asking callers to do this is reasonable. If I understand how this works: * There is one "reader" thread (per connection) which is doing polls. Essentially this is a background thread as far as nbdkit is concerned, not a thread that nbdkit has created or knows about. * There is a pipe-to-self (per connection). When a command is submitted from an nbdkit-managed thread a byte is sent down this pipe. * The reader thread can either wake up because the socket is ready for read or write, or because of an indication on the pipe-to-self. * If the socket is ready for read or write then the normal nbd_aio_notify_read|write function is called which will move the state machine along, and also check for command completion. * If it's pipe-to-self indication then we will (after checking for command completion) check the direction flag again and reenter the poll. The reason for this is because when the other thread started an AIO command it might have changed the handle state. * The pipe ensures there is no race (I think). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Eric Blake
2019-Jun-12 22:23 UTC
Re: [Libguestfs] [nbdkit PATCH v3 0/5] Play with libnbd for nbdkit-nbd
On 6/12/19 4:18 PM, Richard W.M. Jones wrote:> On Wed, Jun 12, 2019 at 04:00:08PM -0500, Eric Blake wrote: >> libnbd-0.1.4-1 is now available in Fedora 29/30 updates testing. >> >> Diffs since v2 - rebase to master, bump from libnbd 0.1.2 to 0.1.3+, >> add tests to TLS usage which flushed out the need to turn relative >> pathnames into absolute, doc tweaks >> >> Now that the testsuite covers TLS and libnbd has been fixed to provide >> the things I found lacking when developing v2, I'm leaning towards >> pushing this on Monday, or sooner if I get a positive review. > > This looks reasonable. > > There's a slow-down, but we may be able to fix that with a concurrent > writer thread in future.I still need to collect updated numbers.> > The pipe-to-self trick is interesting and comes back to the issue of > what is the thread model of libnbd. If we think that libnbd shouldn't > create threads or do tricky Linux-specific stuff internally, then > asking callers to do this is reasonable. > > If I understand how this works: > > * There is one "reader" thread (per connection) which is doing polls. > Essentially this is a background thread as far as nbdkit is > concerned, not a thread that nbdkit has created or knows about. > > * There is a pipe-to-self (per connection). When a command is > submitted from an nbdkit-managed thread a byte is sent down this > pipe.Furthermore, an nbdkit-managed thread IS grabbing the libnbd lock, and either queues the command (behind other pending work) or proceeds to send() until the state machine blocks (which may be the full command, but not always for large NBD_CMD_WRITE). Waiting for the libnbd lock shouldn't be too long since all calls into libnbd (from any thread) are aio and shouldn't be blocking.> > * The reader thread can either wake up because the socket is ready for > read or write, or because of an indication on the pipe-to-self. > > * If the socket is ready for read or write then the normal > nbd_aio_notify_read|write function is called which will move the > state machine along, and also check for command completion. > > * If it's pipe-to-self indication then we will (after checking for > command completion) check the direction flag again and reenter the > poll. The reason for this is because when the other thread started > an AIO command it might have changed the handle state.Indeed - if the reader thread was blocked while the handle was READY, then the nbdkit-managed thread probably got off a send(), and either the entire message was sent or it blocked and now the reader thread needs to know that it should check for POLLOUT to finish sending the message.> > * The pipe ensures there is no race (I think).The libnbd lock ensures no two nbdkit-managed threads are moving the state machine behind the back of any other thread (whether a different nbdkit thread's command, or the reader thread), and the pipe ensures that the reader thread does not deadlock (it doesn't matter which thread advances the state machine, but the reader thread is the only one that checks direction and advances the state machine due to nbd_aio_notify). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jun-13 11:55 UTC
Re: [Libguestfs] [nbdkit PATCH v3 3/5] nbd: Use libnbd 0.1.3+
On 6/12/19 4:00 PM, Eric Blake wrote:> This conversion should be feature compatible with the standalone nbd > code. Note that the use of libnbd makes the binary for this particular > plugin fall under an LGPLv2+ license rather than BSD; but the source > code in nbd.c remains BSD. > > A lot of code simply disappears, now that I'm no longer directly > utilizing the NBD protocol files but relying on libnbd. Coordination > between threads from nbdkit and a central state machine thread is > gated by a lock around the pending transition list, using the same > trick as before of using a semaphore to wake up the right thread after > the server's reply, regardless of out-of-order handling from the > server. > > Benchmark-wise, using the same setup as in commit e897ed70, I see > a slight slowdown: > > Pre-patch, the runs averaged 0.754s, 2.17E+08 bits/s > Post-patch, the runs averaged 0.849s, 1.93E+07 bits/sWith the latest libnbd, the numbers are now: Pre-patch, the runs averaged 16.259s, 1.03E+10 bits/s Post-patch, the runs averaged 17.585s, 9.54E+09 bits/s for about 8% degradation.> > The explanation is that pre-patch, one nbdkit thread can be write()ing > the next request to the server at the same time the reader thread is > read()ing the previous reply; post-patch, libnbd's current structure > handles all I/O under a lock such that there is no simultaneous > bidirectional traffic. There are discussions about adding future > libnbd APIs to address this limitation.My other insight is that since all our calls are aio, they really shouldn't be blocking. The biggest delay is thus not waiting for a lock, but the fact that as long as the state machine is blocked on POLLIN, we are delaying our next request to the server, whether or not the server would actually be able to read the request at the moment. Parallel threads for read() and write() are essential for max performance under blocking I/O, but shouldn't really matter for non-blocking I/O, so our real problem is that libnbd's state machine doesn't have truly independent h->rstate and h->wstate. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [libnbd] Simultaneous read and write
- [nbdkit PATCH v2] nbd: Avoid stuck poll() in nbdplug_close_handle()
- ANNOUNCE: libnbd 1.0 & nbdkit 1.14 - high performance NBD client and server
- Re: [nbdkit PATCH 3/4] nbd: Use libnbd 0.1
- [nbdkit PATCH 3/4] nbd: Use libnbd 0.1