Eric Blake
2019-May-30 03:13 UTC
[Libguestfs] [nbdkit PATCH 0/4] Play with libnbd for nbdkit-add
Patch 1 played with an early draft of Rich's Fedora 30 libnbd package: https://bugzilla.redhat.com/show_bug.cgi?id=1713767#c17 Note that comment 21 provides a newer package 0.1.1-1 with a different API; and that libnbd has more unreleased API changes in the pipeline (whether that will be called 0.2 or 0.1.2); so we'll have to tweak things based on what is actually available in distros. Patch 2 is mechanical, to try and make patch 3 a little easier to follow. Patch 3 shows that libnbd is currently feature-compatible with my hand-rolled nbd client, albeit with an order-of-magnitude slowdown that we'll have to investigate. And patch 4 is my driving reason for this series - it solves a TODO item about allowing the NBD plugin to be the symmetric solution for connecting a new server that requires encryption to an old client that can't speak it. (Although it's late enough at night that I have not actually yet added the tests to 'make check' to exercise it.) Also available at https://repo.or.cz/nbdkit/ericb.git under tag use-libnbd-v1 Eric Blake (4): nbd: Check for libnbd nbd: s/nbd_/nbdplug_/ nbd: Use libnbd 0.1 nbd: Add TLS client support plugins/nbd/nbdkit-nbd-plugin.pod | 77 +- configure.ac | 20 + plugins/nbd/nbd-standalone.c | 1364 +++++++++++++++++++++++++++++ plugins/nbd/nbd.c | 1308 +++++++++------------------ TODO | 13 +- plugins/nbd/Makefile.am | 25 +- 6 files changed, 1860 insertions(+), 947 deletions(-) create mode 100644 plugins/nbd/nbd-standalone.c -- 2.20.1
This patch merely adds the framework to compile the nbd plugin either as stanadlone (the version frozen in time to this commit) or as a client of the brand-new libnbd (a later patch will actually enable that part). Since libnbd does not yet have a stable API, falling back to the standalone version makes sense for a while longer. Signed-off-by: Eric Blake <eblake@redhat.com> --- configure.ac | 21 + plugins/nbd/nbd-standalone.c | 1364 ++++++++++++++++++++++++++++++++++ plugins/nbd/Makefile.am | 17 +- 3 files changed, 1400 insertions(+), 2 deletions(-) create mode 100644 plugins/nbd/nbd-standalone.c diff --git a/configure.ac b/configure.ac index f0b6c4d..77e8d5b 100644 --- a/configure.ac +++ b/configure.ac @@ -703,6 +703,25 @@ AS_IF([test "$with_zlib" != "no"],[ ]) AM_CONDITIONAL([HAVE_ZLIB],[test "x$ZLIB_LIBS" != "x"]) +dnl Check for libnbd (only if you want to compile the full nbd plugin). +dnl XXX API changed between 0.1 and 0.1.1, will likely change again; +dnl hence, this will need revisiting once libnbd declares stable API. +AC_ARG_WITH([libnbd], + [AS_HELP_STRING([--without-libnbd], + [disable nbd plugin @<:@default=check@:>@])], + [], + [with_libnbd=check]) +AS_IF([test "$with_libnbd" != "no"],[ + PKG_CHECK_MODULES([LIBNBD], [libnbd = 0.1],[ + AC_SUBST([LIBNBD_CFLAGS]) + AC_SUBST([LIBNBD_LIBS]) + AC_DEFINE([HAVE_LIBNBD],[1],[libnbd found at compile time.]) + ], + [AC_MSG_WARN([libnbd = 0.1 not found, nbd plugin will be crippled])]) +]) +#test "x$LIBNBD_LIBS" != "x" +AM_CONDITIONAL([HAVE_LIBNBD], [false]) + dnl Check for liblzma (only if you want to compile the xz filter). AC_ARG_WITH([liblzma], [AS_HELP_STRING([--without-liblzma], @@ -974,6 +993,8 @@ feature "iso .................................... " \ test "x$HAVE_ISO_TRUE" = "x" feature "libvirt ................................ " \ test "x$HAVE_LIBVIRT_TRUE" = "x" +feature "nbd .................................... " \ + test "x$HAVE_LIBNBD_TRUE" = "x" feature "ssh .................................... " \ test "x$HAVE_SSH_TRUE" = "x" feature "tar .................................... " \ diff --git a/plugins/nbd/nbd-standalone.c b/plugins/nbd/nbd-standalone.c new file mode 100644 index 0000000..dba46f1 --- /dev/null +++ b/plugins/nbd/nbd-standalone.c @@ -0,0 +1,1364 @@ +/* nbdkit + * Copyright (C) 2017-2019 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdlib.h> +#include <stddef.h> +#include <stdbool.h> +#include <stdio.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> +#include <inttypes.h> +#include <limits.h> +#include <netdb.h> +#include <netinet/in.h> +#include <netinet/tcp.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <assert.h> +#include <pthread.h> +#include <semaphore.h> + +#define NBDKIT_API_VERSION 2 + +#include <nbdkit-plugin.h> +#include "protocol.h" +#include "byte-swapping.h" +#include "cleanup.h" + +/* The per-transaction details */ +struct transaction { + uint64_t cookie; + sem_t sem; + void *buf; + uint64_t offset; + uint32_t count; + uint32_t err; + struct nbdkit_extents *extents; + struct transaction *next; +}; + +/* The per-connection handle */ +struct handle { + /* These fields are read-only once initialized */ + int fd; + int flags; + int64_t size; + bool structured; + bool extents; + pthread_t reader; + + /* Prevents concurrent threads from interleaving writes to server */ + pthread_mutex_t write_lock; + + pthread_mutex_t trans_lock; /* Covers access to all fields below */ + struct transaction *trans; + uint64_t unique; + bool dead; +}; + +/* Connect to server via absolute name of Unix socket */ +static char *sockname; + +/* Connect to server via TCP socket */ +static const char *hostname; +static const char *port; + +/* Human-readable server description */ +static char *servname; + +/* Name of export on remote server, default '', ignored for oldstyle */ +static const char *export; + +/* Number of retries */ +static unsigned long retry; + +/* True to share single server connection among all clients */ +static bool shared; +static struct handle *shared_handle; + +static struct handle *nbd_open_handle (int readonly); +static void nbd_close_handle (struct handle *h); + +static void +nbd_unload (void) +{ + if (shared) + nbd_close_handle (shared_handle); + free (sockname); + free (servname); +} + +/* Called for each key=value passed on the command line. This plugin + * accepts socket=<sockname> or hostname=<hostname>/port=<port> + * (exactly one connection required), and optional parameters + * export=<name>, retry=<n>. + */ +static int +nbd_config (const char *key, const char *value) +{ + char *end; + int r; + + if (strcmp (key, "socket") == 0) { + /* See FILENAMES AND PATHS in nbdkit-plugin(3) */ + free (sockname); + sockname = nbdkit_absolute_path (value); + if (!sockname) + return -1; + } + else if (strcmp (key, "hostname") == 0) + hostname = value; + else if (strcmp (key, "port") == 0) + port = value; + else if (strcmp (key, "export") == 0) + export = value; + else if (strcmp (key, "retry") == 0) { + errno = 0; + retry = strtoul (value, &end, 0); + if (value == end || errno) { + nbdkit_error ("could not parse retry as integer (%s)", value); + return -1; + } + } + else if (strcmp (key, "shared") == 0) { + r = nbdkit_parse_bool (value); + if (r == -1) + return -1; + shared = r; + } + else { + nbdkit_error ("unknown parameter '%s'", key); + return -1; + } + + return 0; +} + +/* Check the user passed exactly one socket description. */ +static int +nbd_config_complete (void) +{ + int r; + + if (sockname) { + struct sockaddr_un sock; + + if (hostname || port) { + nbdkit_error ("cannot mix Unix socket and TCP hostname/port parameters"); + return -1; + } + if (strlen (sockname) > sizeof sock.sun_path) { + nbdkit_error ("socket file name too large"); + return -1; + } + servname = strdup (sockname); + } + else { + if (!hostname) { + nbdkit_error ("must supply socket= or hostname= of external NBD server"); + return -1; + } + if (!port) + port = "10809"; + if (strchr (hostname, ':')) + r = asprintf (&servname, "[%s]:%s", hostname, port); + else + r = asprintf (&servname, "%s:%s", hostname, port); + if (r < 0) { + nbdkit_error ("asprintf: %m"); + return -1; + } + } + + if (!export) + export = ""; + + if (shared && (shared_handle = nbd_open_handle (false)) == NULL) + return -1; + return 0; +} + +#define nbd_config_help \ + "socket=<SOCKNAME> The Unix socket to connect to.\n" \ + "hostname=<HOST> The hostname for the TCP socket to connect to.\n" \ + "port=<PORT> TCP port or service name to use (default 10809).\n" \ + "export=<NAME> Export name to connect to (default \"\").\n" \ + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL + +/* Read an entire buffer, returning 0 on success or -1 with errno set. */ +static int +read_full (int fd, void *buf, size_t len) +{ + ssize_t r; + + while (len) { + r = read (fd, buf, len); + if (r < 0) { + if (errno == EINTR || errno == EAGAIN) + continue; + return -1; + } + if (!r) { + /* Unexpected EOF */ + errno = EBADMSG; + return -1; + } + buf += r; + len -= r; + } + return 0; +} + +/* Write an entire buffer, returning 0 on success or -1 with errno set. */ +static int +write_full (int fd, const void *buf, size_t len) +{ + ssize_t r; + + while (len) { + r = write (fd, buf, len); + if (r < 0) { + if (errno == EINTR || errno == EAGAIN) + continue; + return -1; + } + buf += r; + len -= r; + } + return 0; +} + +/* Called during transmission phases when there is no hope of + * resynchronizing with the server, and all further requests from the + * client will fail. Returns -1 for convenience. */ +static int +nbd_mark_dead (struct handle *h) +{ + int err = errno; + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + if (!h->dead) { + nbdkit_debug ("permanent failure while talking to server %s: %m", + servname); + h->dead = true; + } + else if (!err) + errno = ESHUTDOWN; + /* NBD only accepts a limited set of errno values over the wire, and + nbdkit converts all other values to EINVAL. If we died due to an + errno value that cannot transmit over the wire, translate it to + ESHUTDOWN instead. */ + if (err == EPIPE || err == EBADMSG) + nbdkit_set_error (ESHUTDOWN); + return -1; +} + +/* Find and possibly remove the transaction corresponding to cookie + from the list. */ +static struct transaction * +find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) +{ + struct transaction **ptr; + struct transaction *trans; + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + ptr = &h->trans; + while ((trans = *ptr) != NULL) { + if (cookie == trans->cookie) + break; + ptr = &trans->next; + } + if (trans && remove) + *ptr = trans->next; + return trans; +} + +/* Send a request, return 0 on success or -1 on write failure. */ +static int +nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, uint64_t cookie, + const void *buf) +{ + struct request req = { + .magic = htobe32 (NBD_REQUEST_MAGIC), + .flags = htobe16 (flags), + .type = htobe16 (type), + .handle = cookie, /* Opaque to server, so endianness doesn't matter */ + .offset = htobe64 (offset), + .count = htobe32 (count), + }; + int r; + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->write_lock); + nbdkit_debug ("sending request type %d (%s), flags %#x, offset %#" PRIx64 + ", count %#x, cookie %#" PRIx64, type, name_of_nbd_cmd (type), + flags, offset, count, cookie); + r = write_full (h->fd, &req, sizeof req); + if (buf && !r) + r = write_full (h->fd, buf, count); + return r; +} + +/* Perform the request half of a transaction. On success, return the + transaction; on error return NULL. */ +static struct transaction * +nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, const void *req_buf, + void *rep_buf, struct nbdkit_extents *extents) +{ + int err; + struct transaction *trans; + uint64_t cookie; + + trans = calloc (1, sizeof *trans); + if (!trans) { + nbdkit_error ("unable to track transaction: %m"); + /* Still in sync with server, so don't mark connection dead */ + return NULL; + } + if (sem_init (&trans->sem, 0, 0)) { + nbdkit_error ("unable to create semaphore: %m"); + /* Still in sync with server, so don't mark connection dead */ + free (trans); + return NULL; + } + trans->buf = rep_buf; + trans->count = rep_buf ? count : 0; + trans->offset = offset; + trans->extents = extents; + { + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + if (h->dead) + goto err; + cookie = trans->cookie = h->unique++; + trans->next = h->trans; + h->trans = trans; + } + if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) + return trans; + trans = find_trans_by_cookie (h, cookie, true); + + err: + err = errno; + if (sem_destroy (&trans->sem)) + abort (); + free (trans); + nbd_mark_dead (h); + errno = err; + return NULL; +} + +/* Shorthand for nbd_request_full when no extra buffers are involved. */ +static struct transaction * +nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, + uint32_t count) +{ + return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL); +} + +/* Read a reply, and look up the corresponding transaction. + Return the server's non-negative answer (converted to local errno + value) on success, or -1 on read failure. If structured replies + were negotiated, trans_out is set to NULL if there are still more replies + expected. */ +static int +nbd_reply_raw (struct handle *h, struct transaction **trans_out) +{ + union { + struct simple_reply simple; + struct structured_reply structured; + } rep; + struct transaction *trans; + void *buf = NULL; + CLEANUP_FREE char *payload = NULL; + uint32_t count; + uint32_t id; + struct block_descriptor *extents = NULL; + size_t nextents = 0; + int error = NBD_SUCCESS; + bool more = false; + uint32_t len = 0; /* 0 except for structured reads */ + uint64_t offset = 0; /* if len, absolute offset of structured read chunk */ + bool zero = false; /* if len, whether to read or memset */ + uint16_t errlen; + + *trans_out = NULL; + /* magic and handle overlap between simple and structured replies */ + if (read_full (h->fd, &rep, sizeof rep.simple)) + return nbd_mark_dead (h); + rep.simple.magic = be32toh (rep.simple.magic); + switch (rep.simple.magic) { + case NBD_SIMPLE_REPLY_MAGIC: + nbdkit_debug ("received simple reply for cookie %#" PRIx64 ", status %s", + rep.simple.handle, + name_of_nbd_error (be32toh (rep.simple.error))); + error = be32toh (rep.simple.error); + break; + case NBD_STRUCTURED_REPLY_MAGIC: + if (!h->structured) { + nbdkit_error ("structured response without negotiation"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, sizeof rep.simple + (char *) &rep, + sizeof rep - sizeof rep.simple)) + return nbd_mark_dead (h); + rep.structured.flags = be16toh (rep.structured.flags); + rep.structured.type = be16toh (rep.structured.type); + rep.structured.length = be32toh (rep.structured.length); + nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 + ", payload length %" PRId32, + name_of_nbd_reply_type (rep.structured.type), + rep.structured.handle, rep.structured.length); + if (rep.structured.length > 64 * 1024 * 1024) { + nbdkit_error ("structured reply length is suspiciously large: %" PRId32, + rep.structured.length); + return nbd_mark_dead (h); + } + if (rep.structured.length) { + /* Special case for OFFSET_DATA in order to read tail of chunk + directly into final buffer later on */ + len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && + rep.structured.length > sizeof offset) ? sizeof offset : + rep.structured.length; + payload = malloc (len); + if (!payload) { + nbdkit_error ("reading structured reply payload: %m"); + return nbd_mark_dead (h); + } + if (read_full (h->fd, payload, len)) + return nbd_mark_dead (h); + len = 0; + } + more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); + switch (rep.structured.type) { + case NBD_REPLY_TYPE_NONE: + if (rep.structured.length) { + nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); + return nbd_mark_dead (h); + } + if (more) { + nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); + return nbd_mark_dead (h); + } + break; + case NBD_REPLY_TYPE_OFFSET_DATA: + if (rep.structured.length <= sizeof offset) { + nbdkit_error ("structured reply OFFSET_DATA too small"); + return nbd_mark_dead (h); + } + memcpy (&offset, payload, sizeof offset); + offset = be64toh (offset); + len = rep.structured.length - sizeof offset; + break; + case NBD_REPLY_TYPE_OFFSET_HOLE: + if (rep.structured.length != sizeof offset + sizeof len) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&offset, payload, sizeof offset); + offset = be64toh (offset); + memcpy (&len, payload, sizeof len); + len = be32toh (len); + if (!len) { + nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); + return nbd_mark_dead (h); + } + zero = true; + break; + case NBD_REPLY_TYPE_BLOCK_STATUS: + if (!h->extents) { + nbdkit_error ("block status response without negotiation"); + return nbd_mark_dead (h); + } + if (rep.structured.length < sizeof *extents || + rep.structured.length % sizeof *extents != sizeof id) { + nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); + return nbd_mark_dead (h); + } + nextents = rep.structured.length / sizeof *extents; + extents = (struct block_descriptor *) &payload[sizeof id]; + memcpy (&id, payload, sizeof id); + id = be32toh (id); + nbdkit_debug ("parsing %zu extents for context id %" PRId32, + nextents, id); + break; + default: + if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { + nbdkit_error ("received unexpected structured reply %s", + name_of_nbd_reply_type (rep.structured.type)); + return nbd_mark_dead (h); + } + + if (rep.structured.length < sizeof error + sizeof errlen) { + nbdkit_error ("structured reply error size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&errlen, payload + sizeof error, sizeof errlen); + errlen = be16toh (errlen); + if (errlen > rep.structured.length - sizeof error - sizeof errlen) { + nbdkit_error ("structured reply error message size incorrect"); + return nbd_mark_dead (h); + } + memcpy (&error, payload, sizeof error); + error = be32toh (error); + if (errlen) + nbdkit_debug ("received structured error %s with message: %.*s", + name_of_nbd_error (error), (int) errlen, + payload + sizeof error + sizeof errlen); + else + nbdkit_debug ("received structured error %s without message", + name_of_nbd_error (error)); + } + break; + + default: + nbdkit_error ("received unexpected magic in reply: %#" PRIx32, + rep.simple.magic); + return nbd_mark_dead (h); + } + + trans = find_trans_by_cookie (h, rep.simple.handle, !more); + if (!trans) { + nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); + return nbd_mark_dead (h); + } + + buf = trans->buf; + count = trans->count; + if (nextents) { + if (!trans->extents) { + nbdkit_error ("block status response to a non-status command"); + return nbd_mark_dead (h); + } + offset = trans->offset; + for (size_t i = 0; i < nextents; i++) { + /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ + if (nbdkit_add_extent (trans->extents, offset, + be32toh (extents[i].length), + be32toh (extents[i].status_flags)) == -1) { + error = errno; + break; + } + offset += be32toh (extents[i].length); + } + } + if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { + nbdkit_error ("simple read reply when structured was expected"); + return nbd_mark_dead (h); + } + if (len) { + if (!buf) { + nbdkit_error ("structured read response to a non-read command"); + return nbd_mark_dead (h); + } + if (offset < trans->offset || offset > INT64_MAX || + offset + len > trans->offset + count) { + nbdkit_error ("structured read reply with unexpected offset/length"); + return nbd_mark_dead (h); + } + buf = (char *) buf + offset - trans->offset; + if (zero) { + memset (buf, 0, len); + buf = NULL; + } + else + count = len; + } + + /* Thanks to structured replies, we must preserve an error in any + earlier chunk for replay during the final chunk. */ + if (!more) { + *trans_out = trans; + if (!error) + error = trans->err; + } + else if (error && !trans->err) + trans->err = error; + + /* Convert from wire value to local errno, and perform any final read */ + switch (error) { + case NBD_SUCCESS: + if (buf && read_full (h->fd, buf, count)) + return nbd_mark_dead (h); + return 0; + case NBD_EPERM: + return EPERM; + case NBD_EIO: + return EIO; + case NBD_ENOMEM: + return ENOMEM; + default: + nbdkit_debug ("unexpected error %d, squashing to EINVAL", error); + /* fallthrough */ + case NBD_EINVAL: + return EINVAL; + case NBD_ENOSPC: + return ENOSPC; + case NBD_EOVERFLOW: + return EOVERFLOW; + case NBD_ESHUTDOWN: + return ESHUTDOWN; + } +} + +/* Reader loop. */ +void * +nbd_reader (void *handle) +{ + struct handle *h = handle; + bool done = false; + int r; + + while (!done) { + struct transaction *trans; + + r = nbd_reply_raw (h, &trans); + if (r >= 0) { + if (!trans) + nbdkit_debug ("partial reply handled, waiting for final reply"); + else { + trans->err = r; + if (sem_post (&trans->sem)) { + nbdkit_error ("failed to post semaphore: %m"); + abort (); + } + } + } + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + done = h->dead; + } + + /* Clean up any stranded in-flight requests */ + r = ESHUTDOWN; + while (1) { + struct transaction *trans; + + { + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + trans = h->trans; + h->trans = trans ? trans->next : NULL; + } + if (!trans) + break; + trans->err = r; + if (sem_post (&trans->sem)) { + nbdkit_error ("failed to post semaphore: %m"); + abort (); + } + } + return NULL; +} + +/* Perform the reply half of a transaction. */ +static int +nbd_reply (struct handle *h, struct transaction *trans) +{ + int err; + + if (!trans) { + assert (errno); + return -1; + } + + while ((err = sem_wait (&trans->sem)) == -1 && errno == EINTR) + /* try again */; + if (err) { + nbdkit_debug ("failed to wait on semaphore: %m"); + err = EIO; + } + else + err = trans->err; + if (sem_destroy (&trans->sem)) + abort (); + free (trans); + errno = err; + return err ? -1 : 0; +} + +/* Receive response to @option into @reply, and consume any + payload. If @payload is non-NULL, caller must free *payload. Return + 0 on success, or -1 if communication to server is no longer + possible. */ +static int +nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, + struct fixed_new_option_reply *reply, + void **payload) +{ + CLEANUP_FREE char *buffer = NULL; + + if (payload) + *payload = NULL; + if (read_full (h->fd, reply, sizeof *reply)) { + nbdkit_error ("unable to read option reply: %m"); + return -1; + } + reply->magic = be64toh (reply->magic); + reply->option = be32toh (reply->option); + reply->reply = be32toh (reply->reply); + reply->replylen = be32toh (reply->replylen); + if (reply->magic != NBD_REP_MAGIC || reply->option != option) { + nbdkit_error ("unexpected option reply"); + return -1; + } + if (reply->replylen) { + if (reply->reply == NBD_REP_ACK) { + nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32, + reply->replylen); + return -1; + } + if (reply->replylen > 16 * 1024 * 1024) { + nbdkit_error ("option reply length is suspiciously large: %" PRId32, + reply->replylen); + return -1; + } + /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator + makes that string easier to use, without hurting other reply + types where buffer is not a string */ + buffer = malloc (reply->replylen + 1); + if (!buffer) { + nbdkit_error ("malloc: %m"); + return -1; + } + if (read_full (h->fd, buffer, reply->replylen)) { + nbdkit_error ("unable to read option reply payload: %m"); + return -1; + } + buffer[reply->replylen] = '\0'; + if (!payload) + nbdkit_debug ("ignoring option reply payload"); + else { + *payload = buffer; + buffer = NULL; + } + } + return 0; +} + +/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO. + Return 1 if haggling completed, 0 if haggling failed but + NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ +static int +nbd_newstyle_haggle (struct handle *h) +{ + const char *const query = "base:allocation"; + struct new_option opt; + uint32_t exportnamelen = htobe32 (strlen (export)); + uint32_t nrqueries = htobe32 (1); + uint32_t querylen = htobe32 (strlen (query)); + /* For now, we make no NBD_INFO_* requests, relying on the server to + send its defaults. TODO: nbdkit should let plugins report block + sizes, at which point we should request NBD_INFO_BLOCK_SIZE and + obey any sizes set by server. */ + uint16_t nrinfos = htobe16 (0); + struct fixed_new_option_reply reply; + + nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY); + opt.optlen = htobe32 (0); + if (write_full (h->fd, &opt, sizeof opt)) { + nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_ACK) { + nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); + h->structured = true; + + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrqueries + sizeof querylen + strlen (query)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrqueries, sizeof nrqueries) || + write_full (h->fd, &querylen, sizeof querylen) || + write_full (h->fd, query, strlen (query))) { + nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); + return -1; + } + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + if (reply.reply == NBD_REP_META_CONTEXT) { + /* Cheat: we asked for exactly one context. We could double + check that the server is replying with exactly the + "base:allocation" context, and then remember the id it tells + us to later confirm that responses to NBD_CMD_BLOCK_STATUS + match up; but in the absence of multiple contexts, it's + easier to just assume the server is compliant, and will reuse + the same id, without bothering to check further. */ + nbdkit_debug ("extents enabled"); + h->extents = true; + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) + return -1; + } + if (reply.reply != NBD_REP_ACK) { + if (h->extents) { + nbdkit_error ("unexpected response to set meta context"); + return -1; + } + nbdkit_debug ("ignoring meta context response %s", + name_of_nbd_rep (reply.reply)); + } + } + else { + nbdkit_debug ("structured replies disabled"); + } + + /* Try NBD_OPT_GO */ + nbdkit_debug ("trying NBD_OPT_GO"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_GO); + opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + + sizeof nrinfos); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, &exportnamelen, sizeof exportnamelen) || + write_full (h->fd, export, strlen (export)) || + write_full (h->fd, &nrinfos, sizeof nrinfos)) { + nbdkit_error ("unable to request NBD_OPT_GO: %m"); + return -1; + } + while (1) { + CLEANUP_FREE void *buffer; + struct fixed_new_option_reply_info_export *reply_export; + uint16_t info; + + if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) + return -1; + switch (reply.reply) { + case NBD_REP_INFO: + /* Parse payload, but ignore all except NBD_INFO_EXPORT */ + if (reply.replylen < 2) { + nbdkit_error ("NBD_REP_INFO reply too short"); + return -1; + } + memcpy (&info, buffer, sizeof info); + info = be16toh (info); + switch (info) { + case NBD_INFO_EXPORT: + if (reply.replylen != sizeof *reply_export) { + nbdkit_error ("NBD_INFO_EXPORT reply wrong size"); + return -1; + } + reply_export = buffer; + h->size = be64toh (reply_export->exportsize); + h->flags = be16toh (reply_export->eflags); + break; + default: + nbdkit_debug ("ignoring server info %d", info); + } + break; + case NBD_REP_ACK: + /* End of replies, valid if server already sent NBD_INFO_EXPORT, + observable since h->flags must contain NBD_FLAG_HAS_FLAGS */ + assert (!buffer); + if (!h->flags) { + nbdkit_error ("server omitted NBD_INFO_EXPORT reply to NBD_OPT_GO"); + return -1; + } + nbdkit_debug ("NBD_OPT_GO complete"); + return 1; + case NBD_REP_ERR_UNSUP: + /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("server lacks NBD_OPT_GO support"); + return 0; + default: + /* Unexpected. Either the server sent a legitimate error or an + unexpected reply, but either way, we can't connect. */ + if (NBD_REP_IS_ERR (reply.reply)) + if (reply.replylen) + nbdkit_error ("server rejected NBD_OPT_GO with %s: %s", + name_of_nbd_rep (reply.reply), (char *) buffer); + else + nbdkit_error ("server rejected NBD_OPT_GO with %s", + name_of_nbd_rep (reply.reply)); + else + nbdkit_error ("server used unexpected reply %s to NBD_OPT_GO", + name_of_nbd_rep (reply.reply)); + return -1; + } + } +} + +/* Connect to a Unix socket, returning the fd on success */ +static int +nbd_connect_unix (void) +{ + struct sockaddr_un sock = { .sun_family = AF_UNIX }; + int fd; + + nbdkit_debug ("connecting to Unix socket name=%s", sockname); + fd = socket (AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) { + nbdkit_error ("socket: %m"); + return -1; + } + + /* We already validated length during nbd_config_complete */ + assert (strlen (sockname) <= sizeof sock.sun_path); + memcpy (sock.sun_path, sockname, strlen (sockname)); + if (connect (fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { + nbdkit_error ("connect: %m"); + return -1; + } + return fd; +} + +/* Connect to a TCP socket, returning the fd on success */ +static int +nbd_connect_tcp (void) +{ + struct addrinfo hints = { .ai_family = AF_UNSPEC, + .ai_socktype = SOCK_STREAM, }; + struct addrinfo *result, *rp; + int r; + const int optval = 1; + int fd; + + nbdkit_debug ("connecting to TCP socket host=%s port=%s", hostname, port); + r = getaddrinfo (hostname, port, &hints, &result); + if (r != 0) { + nbdkit_error ("getaddrinfo: %s", gai_strerror (r)); + return -1; + } + + for (rp = result; rp; rp = rp->ai_next) { + fd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol); + if (fd == -1) + continue; + if (connect (fd, rp->ai_addr, rp->ai_addrlen) != -1) + break; + close (fd); + } + freeaddrinfo (result); + if (rp == NULL) { + nbdkit_error ("connect: %m"); + close (fd); + return -1; + } + + if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval, + sizeof (int)) == -1) { + nbdkit_error ("cannot set TCP_NODELAY option: %m"); + close (fd); + return -1; + } + return fd; +} + +/* Create the shared or per-connection handle. */ +static struct handle * +nbd_open_handle (int readonly) +{ + struct handle *h; + struct old_handshake old; + uint64_t version; + + h = calloc (1, sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + retry: + if (sockname) + h->fd = nbd_connect_unix (); + else + h->fd = nbd_connect_tcp (); + if (h->fd == -1) { + if (retry--) { + sleep (1); + goto retry; + } + goto err; + } + + /* old and new handshake share same meaning of first 16 bytes */ + if (read_full (h->fd, &old, offsetof (struct old_handshake, exportsize))) { + nbdkit_error ("unable to read magic: %m"); + goto err; + } + if (strncmp (old.nbdmagic, "NBDMAGIC", sizeof old.nbdmagic)) { + nbdkit_error ("wrong magic, %s is not an NBD server", servname); + goto err; + } + version = be64toh (old.version); + if (version == OLD_VERSION) { + nbdkit_debug ("trying oldstyle connection"); + if (read_full (h->fd, + (char *) &old + offsetof (struct old_handshake, exportsize), + sizeof old - offsetof (struct old_handshake, exportsize))) { + nbdkit_error ("unable to read old handshake: %m"); + goto err; + } + h->size = be64toh (old.exportsize); + h->flags = be16toh (old.eflags); + } + else if (version == NEW_VERSION) { + uint16_t gflags; + uint32_t cflags; + struct new_option opt; + struct new_handshake_finish finish; + size_t expect; + + nbdkit_debug ("trying newstyle connection"); + if (read_full (h->fd, &gflags, sizeof gflags)) { + nbdkit_error ("unable to read global flags: %m"); + goto err; + } + gflags = be16toh (gflags); + cflags = htobe32 (gflags & (NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES)); + if (write_full (h->fd, &cflags, sizeof cflags)) { + nbdkit_error ("unable to return global flags: %m"); + goto err; + } + + /* Prefer NBD_OPT_GO if possible */ + if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { + int rc = nbd_newstyle_haggle (h); + if (rc < 0) + goto err; + if (!rc) + goto export_name; + } + else { + export_name: + /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */ + nbdkit_debug ("trying NBD_OPT_EXPORT_NAME"); + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_EXPORT_NAME); + opt.optlen = htobe32 (strlen (export)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, export, strlen (export))) { + nbdkit_error ("unable to request export '%s': %m", export); + goto err; + } + expect = sizeof finish; + if (gflags & NBD_FLAG_NO_ZEROES) + expect -= sizeof finish.zeroes; + if (read_full (h->fd, &finish, expect)) { + nbdkit_error ("unable to read new handshake: %m"); + goto err; + } + h->size = be64toh (finish.exportsize); + h->flags = be16toh (finish.eflags); + } + } + else { + nbdkit_error ("unexpected version %#" PRIx64, version); + goto err; + } + if (readonly) + h->flags |= NBD_FLAG_READ_ONLY; + + /* Spawn a dedicated reader thread */ + if ((errno = pthread_mutex_init (&h->write_lock, NULL))) { + nbdkit_error ("failed to initialize write mutex: %m"); + goto err; + } + if ((errno = pthread_mutex_init (&h->trans_lock, NULL))) { + nbdkit_error ("failed to initialize transaction mutex: %m"); + pthread_mutex_destroy (&h->write_lock); + goto err; + } + if ((errno = pthread_create (&h->reader, NULL, nbd_reader, h))) { + nbdkit_error ("failed to initialize reader thread: %m"); + pthread_mutex_destroy (&h->write_lock); + pthread_mutex_destroy (&h->trans_lock); + goto err; + } + + return h; + + err: + if (h->fd >= 0) + close (h->fd); + free (h); + return NULL; +} + +/* Create the per-connection handle. */ +static void * +nbd_open (int readonly) +{ + if (shared) + return shared_handle; + return nbd_open_handle (readonly); +} + +/* Free up the shared or per-connection handle. */ +static void +nbd_close_handle (struct handle *h) +{ + if (!h->dead) { + nbd_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); + shutdown (h->fd, SHUT_WR); + } + if ((errno = pthread_join (h->reader, NULL))) + nbdkit_debug ("failed to join reader thread: %m"); + close (h->fd); + pthread_mutex_destroy (&h->write_lock); + pthread_mutex_destroy (&h->trans_lock); + free (h); +} + +/* Free up the per-connection handle. */ +static void +nbd_close (void *handle) +{ + struct handle *h = handle; + + if (!shared) + nbd_close_handle (h); +} + +/* Get the file size. */ +static int64_t +nbd_get_size (void *handle) +{ + struct handle *h = handle; + + return h->size; +} + +static int +nbd_can_write (void *handle) +{ + struct handle *h = handle; + + return !(h->flags & NBD_FLAG_READ_ONLY); +} + +static int +nbd_can_flush (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_FLUSH; +} + +static int +nbd_is_rotational (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_ROTATIONAL; +} + +static int +nbd_can_trim (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_TRIM; +} + +static int +nbd_can_zero (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_WRITE_ZEROES; +} + +static int +nbd_can_fua (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_FUA ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; +} + +static int +nbd_can_multi_conn (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_CAN_MULTI_CONN; +} + +static int +nbd_can_cache (void *handle) +{ + struct handle *h = handle; + + if (h->flags & NBD_FLAG_SEND_CACHE) + return NBDKIT_CACHE_NATIVE; + return NBDKIT_CACHE_NONE; +} + +static int +nbd_can_extents (void *handle) +{ + struct handle *h = handle; + + return h->extents; +} + +/* Read data from the file. */ +static int +nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!flags); + s = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); + return nbd_reply (h, s); +} + +/* Write data to the file. */ +static int +nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, + uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!(flags & ~NBDKIT_FLAG_FUA)); + s = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_WRITE, offset, count, buf, NULL, NULL); + return nbd_reply (h, s); +} + +/* Write zeroes to the file. */ +static int +nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + int f = 0; + + assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); + assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); + + if (!(flags & NBDKIT_FLAG_MAY_TRIM)) + f |= NBD_CMD_FLAG_NO_HOLE; + if (flags & NBDKIT_FLAG_FUA) + f |= NBD_CMD_FLAG_FUA; + s = nbd_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); + return nbd_reply (h, s); +} + +/* Trim a portion of the file. */ +static int +nbd_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!(flags & ~NBDKIT_FLAG_FUA)); + s = nbd_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_TRIM, offset, count); + return nbd_reply (h, s); +} + +/* Flush the file to disk. */ +static int +nbd_flush (void *handle, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!flags); + s = nbd_request (h, 0, NBD_CMD_FLUSH, 0, 0); + return nbd_reply (h, s); +} + +/* Read extents of the file. */ +static int +nbd_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); + s = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0, + NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL, + extents); + return nbd_reply (h, s); +} + +/* Cache a portion of the file. */ +static int +nbd_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + struct transaction *s; + + assert (!flags); + s = nbd_request (h, 0, NBD_CMD_CACHE, offset, count); + return nbd_reply (h, s); +} + +static struct nbdkit_plugin plugin = { + .name = "nbd", + .longname = "nbdkit nbd plugin", + .version = PACKAGE_VERSION, + .unload = nbd_unload, + .config = nbd_config, + .config_complete = nbd_config_complete, + .config_help = nbd_config_help, + .open = nbd_open, + .close = nbd_close, + .get_size = nbd_get_size, + .can_write = nbd_can_write, + .can_flush = nbd_can_flush, + .is_rotational = nbd_is_rotational, + .can_trim = nbd_can_trim, + .can_zero = nbd_can_zero, + .can_fua = nbd_can_fua, + .can_multi_conn = nbd_can_multi_conn, + .can_extents = nbd_can_extents, + .can_cache = nbd_can_cache, + .pread = nbd_pread, + .pwrite = nbd_pwrite, + .zero = nbd_zero, + .flush = nbd_flush, + .trim = nbd_trim, + .extents = nbd_extents, + .cache = nbd_cache, + .errno_is_preserved = 1, +}; + +NBDKIT_REGISTER_PLUGIN (plugin) diff --git a/plugins/nbd/Makefile.am b/plugins/nbd/Makefile.am index 7368e59..bfc2a83 100644 --- a/plugins/nbd/Makefile.am +++ b/plugins/nbd/Makefile.am @@ -1,5 +1,5 @@ # nbdkit -# Copyright (C) 2017 Red Hat Inc. +# Copyright (C) 2017-2019 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -36,7 +36,6 @@ EXTRA_DIST = nbdkit-nbd-plugin.pod plugin_LTLIBRARIES = nbdkit-nbd-plugin.la nbdkit_nbd_plugin_la_SOURCES = \ - nbd.c \ $(top_srcdir)/include/nbdkit-plugin.h nbdkit_nbd_plugin_la_CPPFLAGS = \ @@ -54,6 +53,20 @@ nbdkit_nbd_plugin_la_LIBADD = \ $(top_builddir)/common/protocol/libprotocol.la \ $(top_builddir)/common/utils/libutils.la +# TODO: drop standalone version, which is locked at nbdkit 1.13.4 behavior, +# once libnbd is more commonly available with stable API. +if HAVE_LIBNBD +nbdkit_nbd_plugin_la_SOURCES += \ + nbd.c +nbdkit_nbd_plugin_la_CFLAGS += \ + $(LIBNBD_CFLAGS) +nbdkit_nbd_plugin_la_LIBADD += \ + $(LIBNBD_LIBS) +else !HAVE_LIBNBD +nbdkit_nbd_plugin_la_SOURCES += \ + nbd-standalone.c +endif !HAVE_LIBNBD + if HAVE_POD man_MANS = nbdkit-nbd-plugin.1 -- 2.20.1
libnbd uses the 'nbd_' prefix for its public functions, many of which clash with our internal uses. In order to compile with -lnbd, we need to rename our internal functions. No semantic change, although to test it, configure.ac now makes an actual decision on which of the two .c files to compile based on the presence of libnbd. Signed-off-by: Eric Blake <eblake@redhat.com> --- configure.ac | 3 +- plugins/nbd/nbd.c | 270 +++++++++++++++++++++++----------------------- 2 files changed, 136 insertions(+), 137 deletions(-) diff --git a/configure.ac b/configure.ac index 77e8d5b..981813f 100644 --- a/configure.ac +++ b/configure.ac @@ -719,8 +719,7 @@ AS_IF([test "$with_libnbd" != "no"],[ ], [AC_MSG_WARN([libnbd = 0.1 not found, nbd plugin will be crippled])]) ]) -#test "x$LIBNBD_LIBS" != "x" -AM_CONDITIONAL([HAVE_LIBNBD], [false]) +AM_CONDITIONAL([HAVE_LIBNBD], [test "x$LIBNBD_LIBS" != "x"]) dnl Check for liblzma (only if you want to compile the xz filter). AC_ARG_WITH([liblzma], diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index dba46f1..4daf8a4 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -108,14 +108,14 @@ static unsigned long retry; static bool shared; static struct handle *shared_handle; -static struct handle *nbd_open_handle (int readonly); -static void nbd_close_handle (struct handle *h); +static struct handle *nbdplug_open_handle (int readonly); +static void nbdplug_close_handle (struct handle *h); static void -nbd_unload (void) +nbdplug_unload (void) { if (shared) - nbd_close_handle (shared_handle); + nbdplug_close_handle (shared_handle); free (sockname); free (servname); } @@ -126,7 +126,7 @@ nbd_unload (void) * export=<name>, retry=<n>. */ static int -nbd_config (const char *key, const char *value) +nbdplug_config (const char *key, const char *value) { char *end; int r; @@ -168,7 +168,7 @@ nbd_config (const char *key, const char *value) /* Check the user passed exactly one socket description. */ static int -nbd_config_complete (void) +nbdplug_config_complete (void) { int r; @@ -205,12 +205,12 @@ nbd_config_complete (void) if (!export) export = ""; - if (shared && (shared_handle = nbd_open_handle (false)) == NULL) + if (shared && (shared_handle = nbdplug_open_handle (false)) == NULL) return -1; return 0; } -#define nbd_config_help \ +#define nbdplug_config_help \ "socket=<SOCKNAME> The Unix socket to connect to.\n" \ "hostname=<HOST> The hostname for the TCP socket to connect to.\n" \ "port=<PORT> TCP port or service name to use (default 10809).\n" \ @@ -265,7 +265,7 @@ write_full (int fd, const void *buf, size_t len) * resynchronizing with the server, and all further requests from the * client will fail. Returns -1 for convenience. */ static int -nbd_mark_dead (struct handle *h) +nbdplug_mark_dead (struct handle *h) { int err = errno; @@ -308,9 +308,9 @@ find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) /* Send a request, return 0 on success or -1 on write failure. */ static int -nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, uint64_t cookie, - const void *buf) +nbdplug_request_raw (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, uint64_t cookie, + const void *buf) { struct request req = { .magic = htobe32 (NBD_REQUEST_MAGIC), @@ -335,9 +335,9 @@ nbd_request_raw (struct handle *h, uint16_t flags, uint16_t type, /* Perform the request half of a transaction. On success, return the transaction; on error return NULL. */ static struct transaction * -nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, const void *req_buf, - void *rep_buf, struct nbdkit_extents *extents) +nbdplug_request_full (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count, const void *req_buf, + void *rep_buf, struct nbdkit_extents *extents) { int err; struct transaction *trans; @@ -367,7 +367,7 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, trans->next = h->trans; h->trans = trans; } - if (nbd_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) + if (nbdplug_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) return trans; trans = find_trans_by_cookie (h, cookie, true); @@ -376,17 +376,17 @@ nbd_request_full (struct handle *h, uint16_t flags, uint16_t type, if (sem_destroy (&trans->sem)) abort (); free (trans); - nbd_mark_dead (h); + nbdplug_mark_dead (h); errno = err; return NULL; } -/* Shorthand for nbd_request_full when no extra buffers are involved. */ +/* Shorthand for nbdplug_request_full when no extra buffers are involved. */ static struct transaction * -nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, - uint32_t count) +nbdplug_request (struct handle *h, uint16_t flags, uint16_t type, + uint64_t offset, uint32_t count) { - return nbd_request_full (h, flags, type, offset, count, NULL, NULL, NULL); + return nbdplug_request_full (h, flags, type, offset, count, NULL, NULL, NULL); } /* Read a reply, and look up the corresponding transaction. @@ -395,7 +395,7 @@ nbd_request (struct handle *h, uint16_t flags, uint16_t type, uint64_t offset, were negotiated, trans_out is set to NULL if there are still more replies expected. */ static int -nbd_reply_raw (struct handle *h, struct transaction **trans_out) +nbdplug_reply_raw (struct handle *h, struct transaction **trans_out) { union { struct simple_reply simple; @@ -418,7 +418,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) *trans_out = NULL; /* magic and handle overlap between simple and structured replies */ if (read_full (h->fd, &rep, sizeof rep.simple)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); rep.simple.magic = be32toh (rep.simple.magic); switch (rep.simple.magic) { case NBD_SIMPLE_REPLY_MAGIC: @@ -430,11 +430,11 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) case NBD_STRUCTURED_REPLY_MAGIC: if (!h->structured) { nbdkit_error ("structured response without negotiation"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (read_full (h->fd, sizeof rep.simple + (char *) &rep, sizeof rep - sizeof rep.simple)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); rep.structured.flags = be16toh (rep.structured.flags); rep.structured.type = be16toh (rep.structured.type); rep.structured.length = be32toh (rep.structured.length); @@ -445,7 +445,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) if (rep.structured.length > 64 * 1024 * 1024) { nbdkit_error ("structured reply length is suspiciously large: %" PRId32, rep.structured.length); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (rep.structured.length) { /* Special case for OFFSET_DATA in order to read tail of chunk @@ -456,10 +456,10 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) payload = malloc (len); if (!payload) { nbdkit_error ("reading structured reply payload: %m"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (read_full (h->fd, payload, len)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); len = 0; } more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); @@ -467,17 +467,17 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) case NBD_REPLY_TYPE_NONE: if (rep.structured.length) { nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (more) { nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } break; case NBD_REPLY_TYPE_OFFSET_DATA: if (rep.structured.length <= sizeof offset) { nbdkit_error ("structured reply OFFSET_DATA too small"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&offset, payload, sizeof offset); offset = be64toh (offset); @@ -486,7 +486,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) case NBD_REPLY_TYPE_OFFSET_HOLE: if (rep.structured.length != sizeof offset + sizeof len) { nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&offset, payload, sizeof offset); offset = be64toh (offset); @@ -494,19 +494,19 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) len = be32toh (len); if (!len) { nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } zero = true; break; case NBD_REPLY_TYPE_BLOCK_STATUS: if (!h->extents) { nbdkit_error ("block status response without negotiation"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (rep.structured.length < sizeof *extents || rep.structured.length % sizeof *extents != sizeof id) { nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } nextents = rep.structured.length / sizeof *extents; extents = (struct block_descriptor *) &payload[sizeof id]; @@ -519,18 +519,18 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { nbdkit_error ("received unexpected structured reply %s", name_of_nbd_reply_type (rep.structured.type)); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (rep.structured.length < sizeof error + sizeof errlen) { nbdkit_error ("structured reply error size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&errlen, payload + sizeof error, sizeof errlen); errlen = be16toh (errlen); if (errlen > rep.structured.length - sizeof error - sizeof errlen) { nbdkit_error ("structured reply error message size incorrect"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } memcpy (&error, payload, sizeof error); error = be32toh (error); @@ -547,13 +547,13 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) default: nbdkit_error ("received unexpected magic in reply: %#" PRIx32, rep.simple.magic); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } trans = find_trans_by_cookie (h, rep.simple.handle, !more); if (!trans) { nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } buf = trans->buf; @@ -561,7 +561,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) if (nextents) { if (!trans->extents) { nbdkit_error ("block status response to a non-status command"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } offset = trans->offset; for (size_t i = 0; i < nextents; i++) { @@ -577,17 +577,17 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) } if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { nbdkit_error ("simple read reply when structured was expected"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (len) { if (!buf) { nbdkit_error ("structured read response to a non-read command"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } if (offset < trans->offset || offset > INT64_MAX || offset + len > trans->offset + count) { nbdkit_error ("structured read reply with unexpected offset/length"); - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); } buf = (char *) buf + offset - trans->offset; if (zero) { @@ -612,7 +612,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) switch (error) { case NBD_SUCCESS: if (buf && read_full (h->fd, buf, count)) - return nbd_mark_dead (h); + return nbdplug_mark_dead (h); return 0; case NBD_EPERM: return EPERM; @@ -636,7 +636,7 @@ nbd_reply_raw (struct handle *h, struct transaction **trans_out) /* Reader loop. */ void * -nbd_reader (void *handle) +nbdplug_reader (void *handle) { struct handle *h = handle; bool done = false; @@ -645,7 +645,7 @@ nbd_reader (void *handle) while (!done) { struct transaction *trans; - r = nbd_reply_raw (h, &trans); + r = nbdplug_reply_raw (h, &trans); if (r >= 0) { if (!trans) nbdkit_debug ("partial reply handled, waiting for final reply"); @@ -684,7 +684,7 @@ nbd_reader (void *handle) /* Perform the reply half of a transaction. */ static int -nbd_reply (struct handle *h, struct transaction *trans) +nbdplug_reply (struct handle *h, struct transaction *trans) { int err; @@ -713,9 +713,9 @@ nbd_reply (struct handle *h, struct transaction *trans) 0 on success, or -1 if communication to server is no longer possible. */ static int -nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, - struct fixed_new_option_reply *reply, - void **payload) +nbdplug_newstyle_recv_option_reply (struct handle *h, uint32_t option, + struct fixed_new_option_reply *reply, + void **payload) { CLEANUP_FREE char *buffer = NULL; @@ -771,7 +771,7 @@ nbd_newstyle_recv_option_reply (struct handle *h, uint32_t option, Return 1 if haggling completed, 0 if haggling failed but NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ static int -nbd_newstyle_haggle (struct handle *h) +nbdplug_newstyle_haggle (struct handle *h) { const char *const query = "base:allocation"; struct new_option opt; @@ -793,8 +793,8 @@ nbd_newstyle_haggle (struct handle *h) nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); return -1; } - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, - NULL) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, + NULL) < 0) return -1; if (reply.reply == NBD_REP_ACK) { nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); @@ -813,8 +813,8 @@ nbd_newstyle_haggle (struct handle *h) nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); return -1; } - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, - NULL) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, + NULL) < 0) return -1; if (reply.reply == NBD_REP_META_CONTEXT) { /* Cheat: we asked for exactly one context. We could double @@ -826,8 +826,8 @@ nbd_newstyle_haggle (struct handle *h) the same id, without bothering to check further. */ nbdkit_debug ("extents enabled"); h->extents = true; - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, - NULL) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, + &reply, NULL) < 0) return -1; } if (reply.reply != NBD_REP_ACK) { @@ -861,7 +861,7 @@ nbd_newstyle_haggle (struct handle *h) struct fixed_new_option_reply_info_export *reply_export; uint16_t info; - if (nbd_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) + if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) return -1; switch (reply.reply) { case NBD_REP_INFO: @@ -920,7 +920,7 @@ nbd_newstyle_haggle (struct handle *h) /* Connect to a Unix socket, returning the fd on success */ static int -nbd_connect_unix (void) +nbdplug_connect_unix (void) { struct sockaddr_un sock = { .sun_family = AF_UNIX }; int fd; @@ -932,7 +932,7 @@ nbd_connect_unix (void) return -1; } - /* We already validated length during nbd_config_complete */ + /* We already validated length during nbdplug_config_complete */ assert (strlen (sockname) <= sizeof sock.sun_path); memcpy (sock.sun_path, sockname, strlen (sockname)); if (connect (fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { @@ -944,7 +944,7 @@ nbd_connect_unix (void) /* Connect to a TCP socket, returning the fd on success */ static int -nbd_connect_tcp (void) +nbdplug_connect_tcp (void) { struct addrinfo hints = { .ai_family = AF_UNSPEC, .ai_socktype = SOCK_STREAM, }; @@ -986,7 +986,7 @@ nbd_connect_tcp (void) /* Create the shared or per-connection handle. */ static struct handle * -nbd_open_handle (int readonly) +nbdplug_open_handle (int readonly) { struct handle *h; struct old_handshake old; @@ -1000,9 +1000,9 @@ nbd_open_handle (int readonly) retry: if (sockname) - h->fd = nbd_connect_unix (); + h->fd = nbdplug_connect_unix (); else - h->fd = nbd_connect_tcp (); + h->fd = nbdplug_connect_tcp (); if (h->fd == -1) { if (retry--) { sleep (1); @@ -1053,7 +1053,7 @@ nbd_open_handle (int readonly) /* Prefer NBD_OPT_GO if possible */ if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { - int rc = nbd_newstyle_haggle (h); + int rc = nbdplug_newstyle_haggle (h); if (rc < 0) goto err; if (!rc) @@ -1099,7 +1099,7 @@ nbd_open_handle (int readonly) pthread_mutex_destroy (&h->write_lock); goto err; } - if ((errno = pthread_create (&h->reader, NULL, nbd_reader, h))) { + if ((errno = pthread_create (&h->reader, NULL, nbdplug_reader, h))) { nbdkit_error ("failed to initialize reader thread: %m"); pthread_mutex_destroy (&h->write_lock); pthread_mutex_destroy (&h->trans_lock); @@ -1117,19 +1117,19 @@ nbd_open_handle (int readonly) /* Create the per-connection handle. */ static void * -nbd_open (int readonly) +nbdplug_open (int readonly) { if (shared) return shared_handle; - return nbd_open_handle (readonly); + return nbdplug_open_handle (readonly); } /* Free up the shared or per-connection handle. */ static void -nbd_close_handle (struct handle *h) +nbdplug_close_handle (struct handle *h) { if (!h->dead) { - nbd_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); + nbdplug_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); shutdown (h->fd, SHUT_WR); } if ((errno = pthread_join (h->reader, NULL))) @@ -1142,17 +1142,17 @@ nbd_close_handle (struct handle *h) /* Free up the per-connection handle. */ static void -nbd_close (void *handle) +nbdplug_close (void *handle) { struct handle *h = handle; if (!shared) - nbd_close_handle (h); + nbdplug_close_handle (h); } /* Get the file size. */ static int64_t -nbd_get_size (void *handle) +nbdplug_get_size (void *handle) { struct handle *h = handle; @@ -1160,7 +1160,7 @@ nbd_get_size (void *handle) } static int -nbd_can_write (void *handle) +nbdplug_can_write (void *handle) { struct handle *h = handle; @@ -1168,7 +1168,7 @@ nbd_can_write (void *handle) } static int -nbd_can_flush (void *handle) +nbdplug_can_flush (void *handle) { struct handle *h = handle; @@ -1176,7 +1176,7 @@ nbd_can_flush (void *handle) } static int -nbd_is_rotational (void *handle) +nbdplug_is_rotational (void *handle) { struct handle *h = handle; @@ -1184,7 +1184,7 @@ nbd_is_rotational (void *handle) } static int -nbd_can_trim (void *handle) +nbdplug_can_trim (void *handle) { struct handle *h = handle; @@ -1192,7 +1192,7 @@ nbd_can_trim (void *handle) } static int -nbd_can_zero (void *handle) +nbdplug_can_zero (void *handle) { struct handle *h = handle; @@ -1200,7 +1200,7 @@ nbd_can_zero (void *handle) } static int -nbd_can_fua (void *handle) +nbdplug_can_fua (void *handle) { struct handle *h = handle; @@ -1208,7 +1208,7 @@ nbd_can_fua (void *handle) } static int -nbd_can_multi_conn (void *handle) +nbdplug_can_multi_conn (void *handle) { struct handle *h = handle; @@ -1216,7 +1216,7 @@ nbd_can_multi_conn (void *handle) } static int -nbd_can_cache (void *handle) +nbdplug_can_cache (void *handle) { struct handle *h = handle; @@ -1226,7 +1226,7 @@ nbd_can_cache (void *handle) } static int -nbd_can_extents (void *handle) +nbdplug_can_extents (void *handle) { struct handle *h = handle; @@ -1235,38 +1235,38 @@ nbd_can_extents (void *handle) /* Read data from the file. */ static int -nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, - uint32_t flags) +nbdplug_pread (void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!flags); - s = nbd_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); - return nbd_reply (h, s); + s = nbdplug_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); + return nbdplug_reply (h, s); } /* Write data to the file. */ static int -nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, - uint32_t flags) +nbdplug_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, + uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbd_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_WRITE, offset, count, buf, NULL, NULL); - return nbd_reply (h, s); + s = nbdplug_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_WRITE, offset, count, buf, NULL, NULL); + return nbdplug_reply (h, s); } /* Write zeroes to the file. */ static int -nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +nbdplug_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; - int f = 0; + uint32_t f = 0; assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); @@ -1275,89 +1275,89 @@ nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) f |= NBD_CMD_FLAG_NO_HOLE; if (flags & NBDKIT_FLAG_FUA) f |= NBD_CMD_FLAG_FUA; - s = nbd_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); - return nbd_reply (h, s); + s = nbdplug_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); + return nbdplug_reply (h, s); } /* Trim a portion of the file. */ static int -nbd_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +nbdplug_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbd_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_TRIM, offset, count); - return nbd_reply (h, s); + s = nbdplug_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, + NBD_CMD_TRIM, offset, count); + return nbdplug_reply (h, s); } /* Flush the file to disk. */ static int -nbd_flush (void *handle, uint32_t flags) +nbdplug_flush (void *handle, uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!flags); - s = nbd_request (h, 0, NBD_CMD_FLUSH, 0, 0); - return nbd_reply (h, s); + s = nbdplug_request (h, 0, NBD_CMD_FLUSH, 0, 0); + return nbdplug_reply (h, s); } /* Read extents of the file. */ static int -nbd_extents (void *handle, uint32_t count, uint64_t offset, - uint32_t flags, struct nbdkit_extents *extents) +nbdplug_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) { struct handle *h = handle; struct transaction *s; + uint32_t f = flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0; assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); - s = nbd_request_full (h, flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0, - NBD_CMD_BLOCK_STATUS, offset, count, NULL, NULL, - extents); - return nbd_reply (h, s); + s = nbdplug_request_full (h, f, NBD_CMD_BLOCK_STATUS, offset, count, NULL, + NULL, extents); + return nbdplug_reply (h, s); } /* Cache a portion of the file. */ static int -nbd_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +nbdplug_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; assert (!flags); - s = nbd_request (h, 0, NBD_CMD_CACHE, offset, count); - return nbd_reply (h, s); + s = nbdplug_request (h, 0, NBD_CMD_CACHE, offset, count); + return nbdplug_reply (h, s); } static struct nbdkit_plugin plugin = { .name = "nbd", .longname = "nbdkit nbd plugin", .version = PACKAGE_VERSION, - .unload = nbd_unload, - .config = nbd_config, - .config_complete = nbd_config_complete, - .config_help = nbd_config_help, - .open = nbd_open, - .close = nbd_close, - .get_size = nbd_get_size, - .can_write = nbd_can_write, - .can_flush = nbd_can_flush, - .is_rotational = nbd_is_rotational, - .can_trim = nbd_can_trim, - .can_zero = nbd_can_zero, - .can_fua = nbd_can_fua, - .can_multi_conn = nbd_can_multi_conn, - .can_extents = nbd_can_extents, - .can_cache = nbd_can_cache, - .pread = nbd_pread, - .pwrite = nbd_pwrite, - .zero = nbd_zero, - .flush = nbd_flush, - .trim = nbd_trim, - .extents = nbd_extents, - .cache = nbd_cache, + .unload = nbdplug_unload, + .config = nbdplug_config, + .config_complete = nbdplug_config_complete, + .config_help = nbdplug_config_help, + .open = nbdplug_open, + .close = nbdplug_close, + .get_size = nbdplug_get_size, + .can_write = nbdplug_can_write, + .can_flush = nbdplug_can_flush, + .is_rotational = nbdplug_is_rotational, + .can_trim = nbdplug_can_trim, + .can_zero = nbdplug_can_zero, + .can_fua = nbdplug_can_fua, + .can_multi_conn = nbdplug_can_multi_conn, + .can_extents = nbdplug_can_extents, + .can_cache = nbdplug_can_cache, + .pread = nbdplug_pread, + .pwrite = nbdplug_pwrite, + .zero = nbdplug_zero, + .flush = nbdplug_flush, + .trim = nbdplug_trim, + .extents = nbdplug_extents, + .cache = nbdplug_cache, .errno_is_preserved = 1, }; -- 2.20.1
This conversion should be feature compatible with the standalone nbd code. Note that the use of libnbd makes the binary for this particular plugin fall under an LGPLv2+ license rather than BSD; but the source code in nbd.c remains BSD. A lot of code simply disappears, now that I'm no longer directly utilizing the NBD protocol files but relying on libnbd. Coordination between threads from nbdkit and a central state machine thread is gated by a lock around the pending transition list, using the same trick as before of using a semaphore to wake up the right thread after the server's reply, regardless of out-of-order handling from the server. Benchmark-wise, using the same setup as in commit e897ed70, I see an order-of-magnitude slowdown: Pre-patch, the runs averaged 1.266s, 1.30E+08 bits/s Post-patch, the runs averaged 11.663s, 1.41E+07 bits/s This will need further profiling to determine how much is nbdkit's fault, and how much is libnbd's. I think that we are probably holding some locks for too long, resulting in some serialized performance. Also, the standalone code was able to run read of command 1 in parallel with write of command 2 via separate threads, whereas libnbd's state machine is serializing everything (whether or not the state machine spreads out the I/O to do writes from the thread calling nbd_aio_FOO and reads from the reader thread, the two are never run at once). The trickiest part (for me) was the fact that since the state machine loop is in a separate thread from the initial requests, the loop is often blocked on just POLLIN for the state machine fd. My initial attempt tried to grab the transaction lock before calling nbd_aio_FOO to create the cookie, and generally worked where the server's reply would then wake up the reader. But that was even slower (due to a larger lock section), and also hit a problem where large NBD_CMD_WRITE would block on writes, which the poll() was not expecting (best seen when running tests/test-nozero.sh with LIBNBD_DEBUG=1). I tried blindly setting POLLOUT as a valid reason to break the poll without regard to the current state, but that burned a lot of CPU as it fired even when there was no progress. So I finally settled on using a pipe-to-self and poll()ing on two fds at once (the libnbd state machine and my pipe) to know when the loop needed to start processing because another thread started a command, where the pipe-to-self also makes it possible to not need to hold the transaction lock while generating a cookie. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 1102 +++++++++------------------------------ plugins/nbd/Makefile.am | 8 +- 2 files changed, 262 insertions(+), 848 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 4daf8a4..b1e978a 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -49,43 +49,35 @@ #include <assert.h> #include <pthread.h> #include <semaphore.h> +#include <poll.h> + +#include <libnbd.h> #define NBDKIT_API_VERSION 2 #include <nbdkit-plugin.h> -#include "protocol.h" #include "byte-swapping.h" #include "cleanup.h" /* The per-transaction details */ struct transaction { - uint64_t cookie; + int64_t cookie; sem_t sem; - void *buf; - uint64_t offset; - uint32_t count; uint32_t err; - struct nbdkit_extents *extents; struct transaction *next; }; /* The per-connection handle */ struct handle { /* These fields are read-only once initialized */ - int fd; - int flags; - int64_t size; - bool structured; - bool extents; + struct nbd_handle *nbd; + int fd; /* Cache of nbd_aio_get_fd */ + int fds[2]; /* Pipe for kicking the reader thread */ + bool readonly; pthread_t reader; - /* Prevents concurrent threads from interleaving writes to server */ - pthread_mutex_t write_lock; - - pthread_mutex_t trans_lock; /* Covers access to all fields below */ - struct transaction *trans; - uint64_t unique; - bool dead; + pthread_mutex_t trans_lock; /* Covers access to trans list */ + struct transaction *trans; /* List of pending transactions */ }; /* Connect to server via absolute name of Unix socket */ @@ -218,451 +210,74 @@ nbdplug_config_complete (void) #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL -/* Read an entire buffer, returning 0 on success or -1 with errno set. */ -static int -read_full (int fd, void *buf, size_t len) -{ - ssize_t r; - - while (len) { - r = read (fd, buf, len); - if (r < 0) { - if (errno == EINTR || errno == EAGAIN) - continue; - return -1; - } - if (!r) { - /* Unexpected EOF */ - errno = EBADMSG; - return -1; - } - buf += r; - len -= r; - } - return 0; -} - -/* Write an entire buffer, returning 0 on success or -1 with errno set. */ -static int -write_full (int fd, const void *buf, size_t len) -{ - ssize_t r; - - while (len) { - r = write (fd, buf, len); - if (r < 0) { - if (errno == EINTR || errno == EAGAIN) - continue; - return -1; - } - buf += r; - len -= r; - } - return 0; -} - -/* Called during transmission phases when there is no hope of - * resynchronizing with the server, and all further requests from the - * client will fail. Returns -1 for convenience. */ -static int -nbdplug_mark_dead (struct handle *h) -{ - int err = errno; - - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - if (!h->dead) { - nbdkit_debug ("permanent failure while talking to server %s: %m", - servname); - h->dead = true; - } - else if (!err) - errno = ESHUTDOWN; - /* NBD only accepts a limited set of errno values over the wire, and - nbdkit converts all other values to EINVAL. If we died due to an - errno value that cannot transmit over the wire, translate it to - ESHUTDOWN instead. */ - if (err == EPIPE || err == EBADMSG) - nbdkit_set_error (ESHUTDOWN); - return -1; -} - -/* Find and possibly remove the transaction corresponding to cookie - from the list. */ -static struct transaction * -find_trans_by_cookie (struct handle *h, uint64_t cookie, bool remove) -{ - struct transaction **ptr; - struct transaction *trans; - - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - ptr = &h->trans; - while ((trans = *ptr) != NULL) { - if (cookie == trans->cookie) - break; - ptr = &trans->next; - } - if (trans && remove) - *ptr = trans->next; - return trans; -} - -/* Send a request, return 0 on success or -1 on write failure. */ -static int -nbdplug_request_raw (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, uint64_t cookie, - const void *buf) -{ - struct request req = { - .magic = htobe32 (NBD_REQUEST_MAGIC), - .flags = htobe16 (flags), - .type = htobe16 (type), - .handle = cookie, /* Opaque to server, so endianness doesn't matter */ - .offset = htobe64 (offset), - .count = htobe32 (count), - }; - int r; - - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->write_lock); - nbdkit_debug ("sending request type %d (%s), flags %#x, offset %#" PRIx64 - ", count %#x, cookie %#" PRIx64, type, name_of_nbd_cmd (type), - flags, offset, count, cookie); - r = write_full (h->fd, &req, sizeof req); - if (buf && !r) - r = write_full (h->fd, buf, count); - return r; -} - -/* Perform the request half of a transaction. On success, return the - transaction; on error return NULL. */ -static struct transaction * -nbdplug_request_full (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count, const void *req_buf, - void *rep_buf, struct nbdkit_extents *extents) -{ - int err; - struct transaction *trans; - uint64_t cookie; - - trans = calloc (1, sizeof *trans); - if (!trans) { - nbdkit_error ("unable to track transaction: %m"); - /* Still in sync with server, so don't mark connection dead */ - return NULL; - } - if (sem_init (&trans->sem, 0, 0)) { - nbdkit_error ("unable to create semaphore: %m"); - /* Still in sync with server, so don't mark connection dead */ - free (trans); - return NULL; - } - trans->buf = rep_buf; - trans->count = rep_buf ? count : 0; - trans->offset = offset; - trans->extents = extents; - { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - if (h->dead) - goto err; - cookie = trans->cookie = h->unique++; - trans->next = h->trans; - h->trans = trans; - } - if (nbdplug_request_raw (h, flags, type, offset, count, cookie, req_buf) == 0) - return trans; - trans = find_trans_by_cookie (h, cookie, true); - - err: - err = errno; - if (sem_destroy (&trans->sem)) - abort (); - free (trans); - nbdplug_mark_dead (h); - errno = err; - return NULL; -} - -/* Shorthand for nbdplug_request_full when no extra buffers are involved. */ -static struct transaction * -nbdplug_request (struct handle *h, uint16_t flags, uint16_t type, - uint64_t offset, uint32_t count) -{ - return nbdplug_request_full (h, flags, type, offset, count, NULL, NULL, NULL); -} - -/* Read a reply, and look up the corresponding transaction. - Return the server's non-negative answer (converted to local errno - value) on success, or -1 on read failure. If structured replies - were negotiated, trans_out is set to NULL if there are still more replies - expected. */ -static int -nbdplug_reply_raw (struct handle *h, struct transaction **trans_out) -{ - union { - struct simple_reply simple; - struct structured_reply structured; - } rep; - struct transaction *trans; - void *buf = NULL; - CLEANUP_FREE char *payload = NULL; - uint32_t count; - uint32_t id; - struct block_descriptor *extents = NULL; - size_t nextents = 0; - int error = NBD_SUCCESS; - bool more = false; - uint32_t len = 0; /* 0 except for structured reads */ - uint64_t offset = 0; /* if len, absolute offset of structured read chunk */ - bool zero = false; /* if len, whether to read or memset */ - uint16_t errlen; - - *trans_out = NULL; - /* magic and handle overlap between simple and structured replies */ - if (read_full (h->fd, &rep, sizeof rep.simple)) - return nbdplug_mark_dead (h); - rep.simple.magic = be32toh (rep.simple.magic); - switch (rep.simple.magic) { - case NBD_SIMPLE_REPLY_MAGIC: - nbdkit_debug ("received simple reply for cookie %#" PRIx64 ", status %s", - rep.simple.handle, - name_of_nbd_error (be32toh (rep.simple.error))); - error = be32toh (rep.simple.error); - break; - case NBD_STRUCTURED_REPLY_MAGIC: - if (!h->structured) { - nbdkit_error ("structured response without negotiation"); - return nbdplug_mark_dead (h); - } - if (read_full (h->fd, sizeof rep.simple + (char *) &rep, - sizeof rep - sizeof rep.simple)) - return nbdplug_mark_dead (h); - rep.structured.flags = be16toh (rep.structured.flags); - rep.structured.type = be16toh (rep.structured.type); - rep.structured.length = be32toh (rep.structured.length); - nbdkit_debug ("received structured reply %s for cookie %#" PRIx64 - ", payload length %" PRId32, - name_of_nbd_reply_type (rep.structured.type), - rep.structured.handle, rep.structured.length); - if (rep.structured.length > 64 * 1024 * 1024) { - nbdkit_error ("structured reply length is suspiciously large: %" PRId32, - rep.structured.length); - return nbdplug_mark_dead (h); - } - if (rep.structured.length) { - /* Special case for OFFSET_DATA in order to read tail of chunk - directly into final buffer later on */ - len = (rep.structured.type == NBD_REPLY_TYPE_OFFSET_DATA && - rep.structured.length > sizeof offset) ? sizeof offset : - rep.structured.length; - payload = malloc (len); - if (!payload) { - nbdkit_error ("reading structured reply payload: %m"); - return nbdplug_mark_dead (h); - } - if (read_full (h->fd, payload, len)) - return nbdplug_mark_dead (h); - len = 0; - } - more = !(rep.structured.flags & NBD_REPLY_FLAG_DONE); - switch (rep.structured.type) { - case NBD_REPLY_TYPE_NONE: - if (rep.structured.length) { - nbdkit_error ("NBD_REPLY_TYPE_NONE with invalid payload"); - return nbdplug_mark_dead (h); - } - if (more) { - nbdkit_error ("NBD_REPLY_TYPE_NONE without done flag"); - return nbdplug_mark_dead (h); - } - break; - case NBD_REPLY_TYPE_OFFSET_DATA: - if (rep.structured.length <= sizeof offset) { - nbdkit_error ("structured reply OFFSET_DATA too small"); - return nbdplug_mark_dead (h); - } - memcpy (&offset, payload, sizeof offset); - offset = be64toh (offset); - len = rep.structured.length - sizeof offset; - break; - case NBD_REPLY_TYPE_OFFSET_HOLE: - if (rep.structured.length != sizeof offset + sizeof len) { - nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbdplug_mark_dead (h); - } - memcpy (&offset, payload, sizeof offset); - offset = be64toh (offset); - memcpy (&len, payload, sizeof len); - len = be32toh (len); - if (!len) { - nbdkit_error ("structured reply OFFSET_HOLE length incorrect"); - return nbdplug_mark_dead (h); - } - zero = true; - break; - case NBD_REPLY_TYPE_BLOCK_STATUS: - if (!h->extents) { - nbdkit_error ("block status response without negotiation"); - return nbdplug_mark_dead (h); - } - if (rep.structured.length < sizeof *extents || - rep.structured.length % sizeof *extents != sizeof id) { - nbdkit_error ("structured reply OFFSET_HOLE size incorrect"); - return nbdplug_mark_dead (h); - } - nextents = rep.structured.length / sizeof *extents; - extents = (struct block_descriptor *) &payload[sizeof id]; - memcpy (&id, payload, sizeof id); - id = be32toh (id); - nbdkit_debug ("parsing %zu extents for context id %" PRId32, - nextents, id); - break; - default: - if (!NBD_REPLY_TYPE_IS_ERR (rep.structured.type)) { - nbdkit_error ("received unexpected structured reply %s", - name_of_nbd_reply_type (rep.structured.type)); - return nbdplug_mark_dead (h); - } - - if (rep.structured.length < sizeof error + sizeof errlen) { - nbdkit_error ("structured reply error size incorrect"); - return nbdplug_mark_dead (h); - } - memcpy (&errlen, payload + sizeof error, sizeof errlen); - errlen = be16toh (errlen); - if (errlen > rep.structured.length - sizeof error - sizeof errlen) { - nbdkit_error ("structured reply error message size incorrect"); - return nbdplug_mark_dead (h); - } - memcpy (&error, payload, sizeof error); - error = be32toh (error); - if (errlen) - nbdkit_debug ("received structured error %s with message: %.*s", - name_of_nbd_error (error), (int) errlen, - payload + sizeof error + sizeof errlen); - else - nbdkit_debug ("received structured error %s without message", - name_of_nbd_error (error)); - } - break; - - default: - nbdkit_error ("received unexpected magic in reply: %#" PRIx32, - rep.simple.magic); - return nbdplug_mark_dead (h); - } - - trans = find_trans_by_cookie (h, rep.simple.handle, !more); - if (!trans) { - nbdkit_error ("reply with unexpected cookie %#" PRIx64, rep.simple.handle); - return nbdplug_mark_dead (h); - } - - buf = trans->buf; - count = trans->count; - if (nextents) { - if (!trans->extents) { - nbdkit_error ("block status response to a non-status command"); - return nbdplug_mark_dead (h); - } - offset = trans->offset; - for (size_t i = 0; i < nextents; i++) { - /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ - if (nbdkit_add_extent (trans->extents, offset, - be32toh (extents[i].length), - be32toh (extents[i].status_flags)) == -1) { - error = errno; - break; - } - offset += be32toh (extents[i].length); - } - } - if (buf && h->structured && rep.simple.magic == NBD_SIMPLE_REPLY_MAGIC) { - nbdkit_error ("simple read reply when structured was expected"); - return nbdplug_mark_dead (h); - } - if (len) { - if (!buf) { - nbdkit_error ("structured read response to a non-read command"); - return nbdplug_mark_dead (h); - } - if (offset < trans->offset || offset > INT64_MAX || - offset + len > trans->offset + count) { - nbdkit_error ("structured read reply with unexpected offset/length"); - return nbdplug_mark_dead (h); - } - buf = (char *) buf + offset - trans->offset; - if (zero) { - memset (buf, 0, len); - buf = NULL; - } - else - count = len; - } - - /* Thanks to structured replies, we must preserve an error in any - earlier chunk for replay during the final chunk. */ - if (!more) { - *trans_out = trans; - if (!error) - error = trans->err; - } - else if (error && !trans->err) - trans->err = error; - - /* Convert from wire value to local errno, and perform any final read */ - switch (error) { - case NBD_SUCCESS: - if (buf && read_full (h->fd, buf, count)) - return nbdplug_mark_dead (h); - return 0; - case NBD_EPERM: - return EPERM; - case NBD_EIO: - return EIO; - case NBD_ENOMEM: - return ENOMEM; - default: - nbdkit_debug ("unexpected error %d, squashing to EINVAL", error); - /* fallthrough */ - case NBD_EINVAL: - return EINVAL; - case NBD_ENOSPC: - return ENOSPC; - case NBD_EOVERFLOW: - return EOVERFLOW; - case NBD_ESHUTDOWN: - return ESHUTDOWN; - } -} - /* Reader loop. */ void * nbdplug_reader (void *handle) { struct handle *h = handle; - bool done = false; int r; - while (!done) { - struct transaction *trans; - - r = nbdplug_reply_raw (h, &trans); - if (r >= 0) { - if (!trans) - nbdkit_debug ("partial reply handled, waiting for final reply"); - else { - trans->err = r; + while (!nbd_aio_is_dead (h->nbd) && !nbd_aio_is_closed (h->nbd)) { + struct pollfd fds[2] = { + [0].fd = h->fd, + [1].fd = h->fds[0], + [1].events = POLLIN, + }; + struct transaction *trans, **prev; + int dir; + char c; + + dir = nbd_aio_get_direction (h->nbd); + nbdkit_debug ("polling, dir=%d", dir); + if (dir & LIBNBD_AIO_DIRECTION_READ) + fds[0].events |= POLLIN; + if (dir & LIBNBD_AIO_DIRECTION_WRITE) + fds[0].events |= POLLOUT; + if (poll (fds, 2, -1) == -1) { + nbdkit_error ("poll: %m"); + break; + } + + if (dir & LIBNBD_AIO_DIRECTION_READ && fds[0].revents & POLLIN) + nbd_aio_notify_read (h->nbd); + else if (dir & LIBNBD_AIO_DIRECTION_WRITE && fds[0].revents & POLLOUT) + nbd_aio_notify_write (h->nbd); + + /* Check if we were kicked because a command was started */ + if (fds[1].revents & POLLIN && read (h->fds[0], &c, 1) != 1) { + nbdkit_error ("failed to read pipe: %m"); + break; + } + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + trans = h->trans; + prev = &h->trans; + while (trans) { + r = nbd_aio_command_completed (h->nbd, trans->cookie); + if (r == -1) { + nbdkit_debug ("transaction %" PRId64 " failed: %s", trans->cookie, + nbd_get_error ()); + trans->err = nbd_get_errno (); + if (!trans->err) + trans->err = EIO; + } + if (r) { + nbdkit_debug ("cookie %" PRId64 " completed state machine, status %d", + trans->cookie, trans->err); + *prev = trans->next; if (sem_post (&trans->sem)) { nbdkit_error ("failed to post semaphore: %m"); abort (); } } + else + prev = &trans->next; + trans = *prev; } - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); - done = h->dead; } /* Clean up any stranded in-flight requests */ - r = ESHUTDOWN; + nbdkit_debug ("state machine changed to %s", nbd_connection_state (h->nbd)); while (1) { struct transaction *trans; @@ -673,15 +288,63 @@ nbdplug_reader (void *handle) } if (!trans) break; - trans->err = r; + r = nbd_aio_command_completed (h->nbd, trans->cookie); + if (r == -1) { + nbdkit_debug ("transaction %" PRId64 " failed: %s", trans->cookie, + nbd_get_error ()); + trans->err = nbd_get_errno (); + if (!trans->err) + trans->err = ESHUTDOWN; + } + else if (!r) + trans->err = ESHUTDOWN; if (sem_post (&trans->sem)) { nbdkit_error ("failed to post semaphore: %m"); abort (); } } + nbdkit_debug ("exiting state machine thread"); return NULL; } +/* Register a cookie and return a transaction. */ +static struct transaction * +nbdplug_register (struct handle *h, int64_t cookie) +{ + struct transaction *trans; + char c = 0; + + if (cookie == -1) { + nbdkit_error ("command failed: %s", nbd_get_error ()); + errno = nbd_get_errno (); + return NULL; + } + + nbdkit_debug ("cookie %" PRId64 " started by state machine", cookie); + trans = calloc (1, sizeof *trans); + if (!trans) { + nbdkit_error ("unable to track transaction: %m"); + return NULL; + } + + /* While locked, kick the reader thread and add our transaction */ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&h->trans_lock); + if (write (h->fds[1], &c, 1) != 1) { + nbdkit_error ("write to pipe: %m"); + free (trans); + return NULL; + } + if (sem_init (&trans->sem, 0, 0)) { + nbdkit_error ("unable to create semaphore: %m"); + free (trans); + return NULL; + } + trans->cookie = cookie; + trans->next = h->trans; + h->trans = trans; + return trans; +} + /* Perform the reply half of a transaction. */ static int nbdplug_reply (struct handle *h, struct transaction *trans) @@ -708,400 +371,60 @@ nbdplug_reply (struct handle *h, struct transaction *trans) return err ? -1 : 0; } -/* Receive response to @option into @reply, and consume any - payload. If @payload is non-NULL, caller must free *payload. Return - 0 on success, or -1 if communication to server is no longer - possible. */ -static int -nbdplug_newstyle_recv_option_reply (struct handle *h, uint32_t option, - struct fixed_new_option_reply *reply, - void **payload) -{ - CLEANUP_FREE char *buffer = NULL; - - if (payload) - *payload = NULL; - if (read_full (h->fd, reply, sizeof *reply)) { - nbdkit_error ("unable to read option reply: %m"); - return -1; - } - reply->magic = be64toh (reply->magic); - reply->option = be32toh (reply->option); - reply->reply = be32toh (reply->reply); - reply->replylen = be32toh (reply->replylen); - if (reply->magic != NBD_REP_MAGIC || reply->option != option) { - nbdkit_error ("unexpected option reply"); - return -1; - } - if (reply->replylen) { - if (reply->reply == NBD_REP_ACK) { - nbdkit_error ("NBD_REP_ACK should not have replylen %" PRId32, - reply->replylen); - return -1; - } - if (reply->replylen > 16 * 1024 * 1024) { - nbdkit_error ("option reply length is suspiciously large: %" PRId32, - reply->replylen); - return -1; - } - /* buffer is a string for NBD_REP_ERR_*; adding a NUL terminator - makes that string easier to use, without hurting other reply - types where buffer is not a string */ - buffer = malloc (reply->replylen + 1); - if (!buffer) { - nbdkit_error ("malloc: %m"); - return -1; - } - if (read_full (h->fd, buffer, reply->replylen)) { - nbdkit_error ("unable to read option reply payload: %m"); - return -1; - } - buffer[reply->replylen] = '\0'; - if (!payload) - nbdkit_debug ("ignoring option reply payload"); - else { - *payload = buffer; - buffer = NULL; - } - } - return 0; -} - -/* Attempt to negotiate structured reads, block status, and NBD_OPT_GO. - Return 1 if haggling completed, 0 if haggling failed but - NBD_OPT_EXPORT_NAME is still viable, or -1 on inability to connect. */ -static int -nbdplug_newstyle_haggle (struct handle *h) -{ - const char *const query = "base:allocation"; - struct new_option opt; - uint32_t exportnamelen = htobe32 (strlen (export)); - uint32_t nrqueries = htobe32 (1); - uint32_t querylen = htobe32 (strlen (query)); - /* For now, we make no NBD_INFO_* requests, relying on the server to - send its defaults. TODO: nbdkit should let plugins report block - sizes, at which point we should request NBD_INFO_BLOCK_SIZE and - obey any sizes set by server. */ - uint16_t nrinfos = htobe16 (0); - struct fixed_new_option_reply reply; - - nbdkit_debug ("trying NBD_OPT_STRUCTURED_REPLY"); - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_STRUCTURED_REPLY); - opt.optlen = htobe32 (0); - if (write_full (h->fd, &opt, sizeof opt)) { - nbdkit_error ("unable to request NBD_OPT_STRUCTURED_REPLY: %m"); - return -1; - } - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_STRUCTURED_REPLY, &reply, - NULL) < 0) - return -1; - if (reply.reply == NBD_REP_ACK) { - nbdkit_debug ("structured replies enabled, trying NBD_OPT_SET_META_CONTEXT"); - h->structured = true; - - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_SET_META_CONTEXT); - opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + - sizeof nrqueries + sizeof querylen + strlen (query)); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, &exportnamelen, sizeof exportnamelen) || - write_full (h->fd, export, strlen (export)) || - write_full (h->fd, &nrqueries, sizeof nrqueries) || - write_full (h->fd, &querylen, sizeof querylen) || - write_full (h->fd, query, strlen (query))) { - nbdkit_error ("unable to request NBD_OPT_SET_META_CONTEXT: %m"); - return -1; - } - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, &reply, - NULL) < 0) - return -1; - if (reply.reply == NBD_REP_META_CONTEXT) { - /* Cheat: we asked for exactly one context. We could double - check that the server is replying with exactly the - "base:allocation" context, and then remember the id it tells - us to later confirm that responses to NBD_CMD_BLOCK_STATUS - match up; but in the absence of multiple contexts, it's - easier to just assume the server is compliant, and will reuse - the same id, without bothering to check further. */ - nbdkit_debug ("extents enabled"); - h->extents = true; - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_SET_META_CONTEXT, - &reply, NULL) < 0) - return -1; - } - if (reply.reply != NBD_REP_ACK) { - if (h->extents) { - nbdkit_error ("unexpected response to set meta context"); - return -1; - } - nbdkit_debug ("ignoring meta context response %s", - name_of_nbd_rep (reply.reply)); - } - } - else { - nbdkit_debug ("structured replies disabled"); - } - - /* Try NBD_OPT_GO */ - nbdkit_debug ("trying NBD_OPT_GO"); - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_GO); - opt.optlen = htobe32 (sizeof exportnamelen + strlen (export) + - sizeof nrinfos); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, &exportnamelen, sizeof exportnamelen) || - write_full (h->fd, export, strlen (export)) || - write_full (h->fd, &nrinfos, sizeof nrinfos)) { - nbdkit_error ("unable to request NBD_OPT_GO: %m"); - return -1; - } - while (1) { - CLEANUP_FREE void *buffer; - struct fixed_new_option_reply_info_export *reply_export; - uint16_t info; - - if (nbdplug_newstyle_recv_option_reply (h, NBD_OPT_GO, &reply, &buffer) < 0) - return -1; - switch (reply.reply) { - case NBD_REP_INFO: - /* Parse payload, but ignore all except NBD_INFO_EXPORT */ - if (reply.replylen < 2) { - nbdkit_error ("NBD_REP_INFO reply too short"); - return -1; - } - memcpy (&info, buffer, sizeof info); - info = be16toh (info); - switch (info) { - case NBD_INFO_EXPORT: - if (reply.replylen != sizeof *reply_export) { - nbdkit_error ("NBD_INFO_EXPORT reply wrong size"); - return -1; - } - reply_export = buffer; - h->size = be64toh (reply_export->exportsize); - h->flags = be16toh (reply_export->eflags); - break; - default: - nbdkit_debug ("ignoring server info %d", info); - } - break; - case NBD_REP_ACK: - /* End of replies, valid if server already sent NBD_INFO_EXPORT, - observable since h->flags must contain NBD_FLAG_HAS_FLAGS */ - assert (!buffer); - if (!h->flags) { - nbdkit_error ("server omitted NBD_INFO_EXPORT reply to NBD_OPT_GO"); - return -1; - } - nbdkit_debug ("NBD_OPT_GO complete"); - return 1; - case NBD_REP_ERR_UNSUP: - /* Special case this failure to fall back to NBD_OPT_EXPORT_NAME */ - nbdkit_debug ("server lacks NBD_OPT_GO support"); - return 0; - default: - /* Unexpected. Either the server sent a legitimate error or an - unexpected reply, but either way, we can't connect. */ - if (NBD_REP_IS_ERR (reply.reply)) - if (reply.replylen) - nbdkit_error ("server rejected NBD_OPT_GO with %s: %s", - name_of_nbd_rep (reply.reply), (char *) buffer); - else - nbdkit_error ("server rejected NBD_OPT_GO with %s", - name_of_nbd_rep (reply.reply)); - else - nbdkit_error ("server used unexpected reply %s to NBD_OPT_GO", - name_of_nbd_rep (reply.reply)); - return -1; - } - } -} - -/* Connect to a Unix socket, returning the fd on success */ -static int -nbdplug_connect_unix (void) -{ - struct sockaddr_un sock = { .sun_family = AF_UNIX }; - int fd; - - nbdkit_debug ("connecting to Unix socket name=%s", sockname); - fd = socket (AF_UNIX, SOCK_STREAM, 0); - if (fd < 0) { - nbdkit_error ("socket: %m"); - return -1; - } - - /* We already validated length during nbdplug_config_complete */ - assert (strlen (sockname) <= sizeof sock.sun_path); - memcpy (sock.sun_path, sockname, strlen (sockname)); - if (connect (fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { - nbdkit_error ("connect: %m"); - return -1; - } - return fd; -} - -/* Connect to a TCP socket, returning the fd on success */ -static int -nbdplug_connect_tcp (void) -{ - struct addrinfo hints = { .ai_family = AF_UNSPEC, - .ai_socktype = SOCK_STREAM, }; - struct addrinfo *result, *rp; - int r; - const int optval = 1; - int fd; - - nbdkit_debug ("connecting to TCP socket host=%s port=%s", hostname, port); - r = getaddrinfo (hostname, port, &hints, &result); - if (r != 0) { - nbdkit_error ("getaddrinfo: %s", gai_strerror (r)); - return -1; - } - - for (rp = result; rp; rp = rp->ai_next) { - fd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol); - if (fd == -1) - continue; - if (connect (fd, rp->ai_addr, rp->ai_addrlen) != -1) - break; - close (fd); - } - freeaddrinfo (result); - if (rp == NULL) { - nbdkit_error ("connect: %m"); - close (fd); - return -1; - } - - if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval, - sizeof (int)) == -1) { - nbdkit_error ("cannot set TCP_NODELAY option: %m"); - close (fd); - return -1; - } - return fd; -} - /* Create the shared or per-connection handle. */ static struct handle * nbdplug_open_handle (int readonly) { struct handle *h; - struct old_handshake old; - uint64_t version; + int r; h = calloc (1, sizeof *h); if (h == NULL) { nbdkit_error ("malloc: %m"); return NULL; } + if (pipe (h->fds)) { + nbdkit_error ("pipe: %m"); + free (h); + return NULL; + } retry: + h->fd = -1; + h->nbd = nbd_create (); + if (!h->nbd) + goto err; + if (nbd_set_export_name (h->nbd, export) == -1) + goto err; + if (nbd_request_meta_context (h->nbd, "base:allocation") == -1) + goto err; if (sockname) - h->fd = nbdplug_connect_unix (); + r = nbd_connect_unix (h->nbd, sockname); else - h->fd = nbdplug_connect_tcp (); - if (h->fd == -1) { + r = nbd_connect_tcp (h->nbd, hostname, port); + if (r == -1) { if (retry--) { + nbdkit_debug ("connect failed; will try again: %s", nbd_get_error ()); sleep (1); + nbd_close (h->nbd); goto retry; } goto err; } - - /* old and new handshake share same meaning of first 16 bytes */ - if (read_full (h->fd, &old, offsetof (struct old_handshake, exportsize))) { - nbdkit_error ("unable to read magic: %m"); - goto err; - } - if (strncmp (old.nbdmagic, "NBDMAGIC", sizeof old.nbdmagic)) { - nbdkit_error ("wrong magic, %s is not an NBD server", servname); + h->fd = nbd_aio_get_fd (h->nbd); + if (h->fd == -1) goto err; - } - version = be64toh (old.version); - if (version == OLD_VERSION) { - nbdkit_debug ("trying oldstyle connection"); - if (read_full (h->fd, - (char *) &old + offsetof (struct old_handshake, exportsize), - sizeof old - offsetof (struct old_handshake, exportsize))) { - nbdkit_error ("unable to read old handshake: %m"); - goto err; - } - h->size = be64toh (old.exportsize); - h->flags = be16toh (old.eflags); - } - else if (version == NEW_VERSION) { - uint16_t gflags; - uint32_t cflags; - struct new_option opt; - struct new_handshake_finish finish; - size_t expect; - nbdkit_debug ("trying newstyle connection"); - if (read_full (h->fd, &gflags, sizeof gflags)) { - nbdkit_error ("unable to read global flags: %m"); - goto err; - } - gflags = be16toh (gflags); - cflags = htobe32 (gflags & (NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES)); - if (write_full (h->fd, &cflags, sizeof cflags)) { - nbdkit_error ("unable to return global flags: %m"); - goto err; - } - - /* Prefer NBD_OPT_GO if possible */ - if (gflags & NBD_FLAG_FIXED_NEWSTYLE) { - int rc = nbdplug_newstyle_haggle (h); - if (rc < 0) - goto err; - if (!rc) - goto export_name; - } - else { - export_name: - /* Option haggling untried or failed, use older NBD_OPT_EXPORT_NAME */ - nbdkit_debug ("trying NBD_OPT_EXPORT_NAME"); - opt.version = htobe64 (NEW_VERSION); - opt.option = htobe32 (NBD_OPT_EXPORT_NAME); - opt.optlen = htobe32 (strlen (export)); - if (write_full (h->fd, &opt, sizeof opt) || - write_full (h->fd, export, strlen (export))) { - nbdkit_error ("unable to request export '%s': %m", export); - goto err; - } - expect = sizeof finish; - if (gflags & NBD_FLAG_NO_ZEROES) - expect -= sizeof finish.zeroes; - if (read_full (h->fd, &finish, expect)) { - nbdkit_error ("unable to read new handshake: %m"); - goto err; - } - h->size = be64toh (finish.exportsize); - h->flags = be16toh (finish.eflags); - } - } - else { - nbdkit_error ("unexpected version %#" PRIx64, version); - goto err; - } if (readonly) - h->flags |= NBD_FLAG_READ_ONLY; + h->readonly = true; /* Spawn a dedicated reader thread */ - if ((errno = pthread_mutex_init (&h->write_lock, NULL))) { - nbdkit_error ("failed to initialize write mutex: %m"); - goto err; - } if ((errno = pthread_mutex_init (&h->trans_lock, NULL))) { nbdkit_error ("failed to initialize transaction mutex: %m"); - pthread_mutex_destroy (&h->write_lock); goto err; } if ((errno = pthread_create (&h->reader, NULL, nbdplug_reader, h))) { nbdkit_error ("failed to initialize reader thread: %m"); - pthread_mutex_destroy (&h->write_lock); pthread_mutex_destroy (&h->trans_lock); goto err; } @@ -1109,8 +432,11 @@ nbdplug_open_handle (int readonly) return h; err: - if (h->fd >= 0) - close (h->fd); + close (h->fds[0]); + close (h->fds[1]); + nbdkit_error ("failure while creating nbd handle: %s", nbd_get_error ()); + if (h->nbd) + nbd_close (h->nbd); free (h); return NULL; } @@ -1128,14 +454,13 @@ nbdplug_open (int readonly) static void nbdplug_close_handle (struct handle *h) { - if (!h->dead) { - nbdplug_request_raw (h, 0, NBD_CMD_DISC, 0, 0, 0, NULL); - shutdown (h->fd, SHUT_WR); - } + if (nbd_shutdown (h->nbd) == -1) + nbdkit_debug ("failed to clean up handle: %s", nbd_get_error ()); if ((errno = pthread_join (h->reader, NULL))) nbdkit_debug ("failed to join reader thread: %m"); - close (h->fd); - pthread_mutex_destroy (&h->write_lock); + close (h->fds[0]); + close (h->fds[1]); + nbd_close (h->nbd); pthread_mutex_destroy (&h->trans_lock); free (h); } @@ -1150,87 +475,137 @@ nbdplug_close (void *handle) nbdplug_close_handle (h); } + + /* Get the file size. */ static int64_t nbdplug_get_size (void *handle) { struct handle *h = handle; + int64_t size = nbd_get_size (h->nbd); - return h->size; + if (size == -1) { + nbdkit_error ("failure to get size: %s", nbd_get_error ()); + return -1; + } + return size; } static int nbdplug_can_write (void *handle) { struct handle *h = handle; + int i = nbd_read_only (h->nbd); - return !(h->flags & NBD_FLAG_READ_ONLY); + if (i == -1) { + nbdkit_error ("failure to check readonly flag: %s", nbd_get_error ()); + return -1; + } + return !(i || h->readonly); } static int nbdplug_can_flush (void *handle) { struct handle *h = handle; + int i = nbd_can_flush (h->nbd); - return h->flags & NBD_FLAG_SEND_FLUSH; + if (i == -1) { + nbdkit_error ("failure to check flush flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_is_rotational (void *handle) { struct handle *h = handle; + int i = nbd_is_rotational (h->nbd); - return h->flags & NBD_FLAG_ROTATIONAL; + if (i == -1) { + nbdkit_error ("failure to check rotational flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_trim (void *handle) { struct handle *h = handle; + int i = nbd_can_trim (h->nbd); - return h->flags & NBD_FLAG_SEND_TRIM; + if (i == -1) { + nbdkit_error ("failure to check trim flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_zero (void *handle) { struct handle *h = handle; + int i = nbd_can_zero (h->nbd); - return h->flags & NBD_FLAG_SEND_WRITE_ZEROES; + if (i == -1) { + nbdkit_error ("failure to check zero flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_fua (void *handle) { struct handle *h = handle; + int i = nbd_can_fua (h->nbd); - return h->flags & NBD_FLAG_SEND_FUA ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; + if (i == -1) { + nbdkit_error ("failure to check fua flag: %s", nbd_get_error ()); + return -1; + } + return i ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; } static int nbdplug_can_multi_conn (void *handle) { struct handle *h = handle; + int i = nbd_can_multi_conn (h->nbd); - return h->flags & NBD_FLAG_CAN_MULTI_CONN; + if (i == -1) { + nbdkit_error ("failure to check multi-conn flag: %s", nbd_get_error ()); + return -1; + } + return i; } static int nbdplug_can_cache (void *handle) { struct handle *h = handle; + int i = nbd_can_cache (h->nbd); - if (h->flags & NBD_FLAG_SEND_CACHE) - return NBDKIT_CACHE_NATIVE; - return NBDKIT_CACHE_NONE; + if (i == -1) { + nbdkit_error ("failure to check cache flag: %s", nbd_get_error ()); + return -1; + } + return i ? NBDKIT_CACHE_NATIVE : NBDKIT_CACHE_NONE; } static int nbdplug_can_extents (void *handle) { struct handle *h = handle; + int i = nbd_can_meta_context (h->nbd, "base:allocation"); - return h->extents; + if (i == -1) { + nbdkit_error ("failure to check extents ability: %s", nbd_get_error ()); + return -1; + } + return i; } /* Read data from the file. */ @@ -1242,7 +617,8 @@ nbdplug_pread (void *handle, void *buf, uint32_t count, uint64_t offset, struct transaction *s; assert (!flags); - s = nbdplug_request_full (h, 0, NBD_CMD_READ, offset, count, NULL, buf, NULL); + /* XXX API changes in libnbd 0.1.2: */ + s = nbdplug_register (h, nbd_aio_pread (h->nbd, buf, count, offset /* , 0 */)); return nbdplug_reply (h, s); } @@ -1253,10 +629,10 @@ nbdplug_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, { struct handle *h = handle; struct transaction *s; + uint32_t f = flags & NBDKIT_FLAG_FUA ? LIBNBD_CMD_FLAG_FUA : 0; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbdplug_request_full (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_WRITE, offset, count, buf, NULL, NULL); + s = nbdplug_register (h, nbd_aio_pwrite (h->nbd, buf, count, offset, f)); return nbdplug_reply (h, s); } @@ -1269,13 +645,12 @@ nbdplug_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) uint32_t f = 0; assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); - assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); if (!(flags & NBDKIT_FLAG_MAY_TRIM)) - f |= NBD_CMD_FLAG_NO_HOLE; + f |= LIBNBD_CMD_FLAG_NO_HOLE; if (flags & NBDKIT_FLAG_FUA) - f |= NBD_CMD_FLAG_FUA; - s = nbdplug_request (h, f, NBD_CMD_WRITE_ZEROES, offset, count); + f |= LIBNBD_CMD_FLAG_FUA; + s = nbdplug_register (h, nbd_aio_zero (h->nbd, count, offset, f)); return nbdplug_reply (h, s); } @@ -1285,10 +660,10 @@ nbdplug_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; struct transaction *s; + uint32_t f = flags & NBDKIT_FLAG_FUA ? LIBNBD_CMD_FLAG_FUA : 0; assert (!(flags & ~NBDKIT_FLAG_FUA)); - s = nbdplug_request (h, flags & NBDKIT_FLAG_FUA ? NBD_CMD_FLAG_FUA : 0, - NBD_CMD_TRIM, offset, count); + s = nbdplug_register (h, nbd_aio_trim (h->nbd, count, offset, f)); return nbdplug_reply (h, s); } @@ -1300,10 +675,36 @@ nbdplug_flush (void *handle, uint32_t flags) struct transaction *s; assert (!flags); - s = nbdplug_request (h, 0, NBD_CMD_FLUSH, 0, 0); + /* XXX API changes in libnbd 0.1.2: */ + s = nbdplug_register (h, nbd_aio_flush (h->nbd /* , f */)); return nbdplug_reply (h, s); } +struct nbdplug_extent_data +{ + struct nbdkit_extents *extents; + int err; +}; + +static void +nbdplug_extent (void *opaque, const char *metacontext, uint64_t offset, + uint32_t *entries, size_t nr_entries) +{ + struct nbdplug_extent_data *data = opaque; + + assert (strcmp (metacontext, "base:allocation") == 0); + assert (nr_entries % 2 == 0); + while (nr_entries && !data->err) { + /* We rely on the fact that NBDKIT_EXTENT_* match NBD_STATE_* */ + if (nbdkit_add_extent (data->extents, offset, + entries[0], entries[1]) == -1) + data->err = errno; + offset += entries[0]; + entries += 2; + nr_entries -= 2; + } +} + /* Read extents of the file. */ static int nbdplug_extents (void *handle, uint32_t count, uint64_t offset, @@ -1311,12 +712,20 @@ nbdplug_extents (void *handle, uint32_t count, uint64_t offset, { struct handle *h = handle; struct transaction *s; - uint32_t f = flags & NBDKIT_FLAG_REQ_ONE ? NBD_CMD_FLAG_REQ_ONE : 0; + uint32_t f = flags & NBDKIT_FLAG_REQ_ONE ? LIBNBD_CMD_FLAG_REQ_ONE : 0; + struct nbdplug_extent_data data = { extents, 0 }; + int r; - assert (!(flags & ~NBDKIT_FLAG_REQ_ONE) && h->extents); - s = nbdplug_request_full (h, f, NBD_CMD_BLOCK_STATUS, offset, count, NULL, - NULL, extents); - return nbdplug_reply (h, s); + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + /* XXX API changes in libnbd 0.1.2: */ + s = nbdplug_register (h, nbd_aio_block_status (h->nbd, count, offset, f, + &data, nbdplug_extent /* , f */)); + r = nbdplug_reply (h, s); + if (r == 0 && data.err) { + errno = data.err; + r = -1; + } + return r; } /* Cache a portion of the file. */ @@ -1327,7 +736,8 @@ nbdplug_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) struct transaction *s; assert (!flags); - s = nbdplug_request (h, 0, NBD_CMD_CACHE, offset, count); + /* XXX API changes in libnbd 0.1.2: */ + s = nbdplug_register (h, nbd_aio_cache (h->nbd, count, offset /* , f */)); return nbdplug_reply (h, s); } diff --git a/plugins/nbd/Makefile.am b/plugins/nbd/Makefile.am index bfc2a83..a854a44 100644 --- a/plugins/nbd/Makefile.am +++ b/plugins/nbd/Makefile.am @@ -41,7 +41,6 @@ nbdkit_nbd_plugin_la_SOURCES = \ nbdkit_nbd_plugin_la_CPPFLAGS = \ -I$(top_srcdir)/include \ -I$(top_srcdir)/common/include \ - -I$(top_srcdir)/common/protocol \ -I$(top_srcdir)/common/utils \ -I$(top_srcdir)/server nbdkit_nbd_plugin_la_CFLAGS = \ @@ -50,7 +49,6 @@ nbdkit_nbd_plugin_la_LDFLAGS = \ -module -avoid-version -shared \ -Wl,--version-script=$(top_srcdir)/plugins/plugins.syms nbdkit_nbd_plugin_la_LIBADD = \ - $(top_builddir)/common/protocol/libprotocol.la \ $(top_builddir)/common/utils/libutils.la # TODO: drop standalone version, which is locked at nbdkit 1.13.4 behavior, @@ -62,9 +60,15 @@ nbdkit_nbd_plugin_la_CFLAGS += \ $(LIBNBD_CFLAGS) nbdkit_nbd_plugin_la_LIBADD += \ $(LIBNBD_LIBS) + else !HAVE_LIBNBD nbdkit_nbd_plugin_la_SOURCES += \ nbd-standalone.c +nbdkit_nbd_plugin_la_CPPFLAGS += \ + -I$(top_srcdir)/common/protocol \ +nbdkit_nbd_plugin_la_LIBADD += \ + $(top_builddir)/common/protocol/libprotocol.la + endif !HAVE_LIBNBD if HAVE_POD -- 2.20.1
Eric Blake
2019-May-30 03:13 UTC
[Libguestfs] [nbdkit PATCH 4/4] nbd: Add TLS client support
Well, really add parameters to pass on to libnbd which does all the heavy lifting :) I'd love to also add a uri=... parameter, but until I fix configure.ac to permit a newer API than libnbd 0.1 that is not possible. Signed-off-by: Eric Blake <eblake@redhat.com> --- RFC: Compiled, but I have not heavily tested it yet. Ideally, I should enhance or copy test-tls{,-psk}.sh to show nbdkit as the encryption client forwarding on to a plaintext qemu-io Unix socket. --- plugins/nbd/nbdkit-nbd-plugin.pod | 77 +++++++++++++++++++++++++------ plugins/nbd/nbd.c | 72 +++++++++++++++++++++++++++-- TODO | 13 +----- 3 files changed, 132 insertions(+), 30 deletions(-) diff --git a/plugins/nbd/nbdkit-nbd-plugin.pod b/plugins/nbd/nbdkit-nbd-plugin.pod index 7baff98..77f34d3 100644 --- a/plugins/nbd/nbdkit-nbd-plugin.pod +++ b/plugins/nbd/nbdkit-nbd-plugin.pod @@ -5,7 +5,8 @@ nbdkit-nbd-plugin - nbdkit nbd plugin =head1 SYNOPSIS nbdkit nbd { socket=SOCKNAME | hostname=HOST [port=PORT] } [export=NAME] - [retry=N] [shared=BOOL] + [retry=N] [shared=BOOL] [tls=MODE] [tls-certificates=DIR] [tls-verify=BOOL] + [tls-username=NAME] [tls-psk=FILE] =head1 DESCRIPTION @@ -20,13 +21,11 @@ original server lacks it). Use of this plugin along with nbdkit filters (adding I<--filter> to the nbdkit command line) makes it possible to apply any nbdkit filter to any other NBD server. -For now, this is limited to connecting to another NBD server over an -unencrypted connection; if the data is sensitive, it is better to -stick to a Unix socket rather than transmitting plaintext over TCP. It -is feasible that future additions will support encryption. - =head1 PARAMETERS +The following parameters are available whether or not the plugin was +compiled against libnbd: + =over 4 =item B<socket=>SOCKNAME @@ -69,6 +68,46 @@ nbdkit will share that single connection. =back +The following parameters are only available if the plugin was compiled +against libnbd: + +=over 4 + +=item B<tls=>MODE + +Selects which TLS mode to use with the server. If no other tls option +is present, this defaults to C<off>, where the client does not attempt +encryption (and may be rejected by a server that requires it). If +omitted but another tls option is present, this defaults to C<on>, +where the client opportunistically attempts a TLS handshake, but will +continue running unencrypted if the server does not support +encryption. If set to C<require>, this requires an encrypted +connection to the server. + +=item B<tls-certificates=>DIR + +This specifies the directory containing X.509 client certificates to +present to the server. + +=item B<tls-verify=>BOOL + +Setting this to true disables server name verification, which opens +you to potential Man-in-the-Middle (MITM) attacks, but allows for a +simpler setup for distributing certificates. + +=item B<tls-username=>NAME + +If provided, this overrides the user name to present to the server +alongside the certificate. + +=item B<tls-psk=>FILE + +If provided, this is the filename containing the Pre-Shared Keys (PSK) +to present to the server. While this is easier to set up than X.509, +it requires that the PSK file be transmitted over a secure channel. + +=back + =head1 EXAMPLES Expose the contents of an export served by an old style server over a @@ -80,9 +119,9 @@ that the old server exits. nbdkit --exit-with-parent --tls=require nbd socket=$sock & exec /path/to/oldserver --socket=$sock ) - ┌────────────┐ ┌────────┐ ┌────────────┐ - │ new client │ ────────▶│ nbdkit │ ────────▶│ old server │ - └────────────┘ TCP └────────┘ Unix └────────────┘ + ┌────────────┐ TLS ┌────────┐ plaintext ┌────────────┐ + │ new client │ ────────▶│ nbdkit │ ───────────▶│ old server │ + └────────────┘ TCP └────────┘ Unix └────────────┘ Combine nbdkit's partition filter with qemu-nbd's ability to visit qcow2 files (nbdkit does not have a native qcow2 plugin), performing @@ -97,16 +136,23 @@ utilize a 5-second retry to give qemu-nbd time to create the socket: exec qemu-nbd -k $sock -f qcow2 /path/to/image.qcow2 ) Conversely, expose the contents of export I<foo> from a new style -server with unencrypted data to a client that can only consume +server with encrypted data to a client that can only consume unencrypted old style. Use I<--run> to clean up nbdkit at the time the -client exits. +client exits. In general, note that it is best to keep the plaintext +connection limited to a Unix socket on the local machine. - nbdkit -U - -o nbd hostname=example.com export=foo \ + nbdkit -U - -o nbd hostname=example.com export=foo tls=require \ --run '/path/to/oldclient --socket=$unixsocket' - ┌────────────┐ ┌────────┐ ┌────────────┐ - │ old client │ ────────▶│ nbdkit │ ────────▶│ new server │ - └────────────┘ Unix └────────┘ TCP └────────────┘ + ┌────────────┐ plaintext ┌────────┐ TLS ┌────────────┐ + │ old client │ ───────────▶│ nbdkit │ ────────▶│ new server │ + └────────────┘ Unix └────────┘ TCP └────────────┘ + +Look for the C<libnbd_version> line to learn if the nbd plugin was +compiled against libnbd for TLS support (required for the previous +example): + + nbdkit --dump-plugin nbd =head1 SEE ALSO @@ -115,6 +161,7 @@ L<nbdkit-captive(1)>, L<nbdkit-filter(3)>, L<nbdkit-tls(1)>, L<nbdkit-plugin(3)>, +L<libnbd(3)>, L<qemu-nbd(1)>. =head1 AUTHORS diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index b1e978a..dbf9096 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -87,6 +87,9 @@ static char *sockname; static const char *hostname; static const char *port; +/* XXX Need libnbd 0.1.1 to connect via URI */ +/* static const char *uri; */ + /* Human-readable server description */ static char *servname; @@ -100,6 +103,13 @@ static unsigned long retry; static bool shared; static struct handle *shared_handle; +/* Control TLS settings */ +static int tls = -1; +static const char *tls_certificates; +static int tls_verify = -1; +static const char *tls_username; +static const char *tls_psk; + static struct handle *nbdplug_open_handle (int readonly); static void nbdplug_close_handle (struct handle *h); @@ -113,9 +123,10 @@ nbdplug_unload (void) } /* Called for each key=value passed on the command line. This plugin - * accepts socket=<sockname> or hostname=<hostname>/port=<port> - * (exactly one connection required), and optional parameters - * export=<name>, retry=<n>. + * accepts socket=<sockname>, hostname=<hostname>/port=<port>, or + * uri=<uri> (exactly one connection required), and optional + * parameters export=<name>, retry=<n>, shared=<bool>, and various tls + * settings. */ static int nbdplug_config (const char *key, const char *value) @@ -134,6 +145,11 @@ nbdplug_config (const char *key, const char *value) hostname = value; else if (strcmp (key, "port") == 0) port = value; + else if (strcmp (key, "uri") == 0) { + /* XXX Implement once we build against newer libnbd */ + nbdkit_error ("libnbd too old for uri support"); + return -1; + } else if (strcmp (key, "export") == 0) export = value; else if (strcmp (key, "retry") == 0) { @@ -150,6 +166,29 @@ nbdplug_config (const char *key, const char *value) return -1; shared = r; } + else if (strcmp (key, "tls") == 0) { + if (strcasecmp (optarg, "require") == 0 || + strcasecmp (optarg, "required") == 0 || + strcasecmp (optarg, "force") == 0) + tls = 2; + else { + tls = nbdkit_parse_bool (optarg); + if (tls == -1) + exit (EXIT_FAILURE); + } + } + else if (strcmp (key, "tls-certificates") == 0) + tls_certificates = value; + else if (strcmp (key, "tls-verify") == 0) { + r = nbdkit_parse_bool (value); + if (r == -1) + return -1; + tls_verify = r; + } + else if (strcmp (key, "tls-username") == 0) + tls_username = value; + else if (strcmp (key, "tls-psk") == 0) + tls_psk = value; else { nbdkit_error ("unknown parameter '%s'", key); return -1; @@ -197,6 +236,9 @@ nbdplug_config_complete (void) if (!export) export = ""; + if (tls == -1) + tls = tls_certificates || tls_verify >= 0 || tls_username || tls_psk; + if (shared && (shared_handle = nbdplug_open_handle (false)) == NULL) return -1; return 0; @@ -207,6 +249,18 @@ nbdplug_config_complete (void) "hostname=<HOST> The hostname for the TCP socket to connect to.\n" \ "port=<PORT> TCP port or service name to use (default 10809).\n" \ "export=<NAME> Export name to connect to (default \"\").\n" \ + "tls=<MODE> How to use TLS; one of 'off', 'on', or 'require'.\n" \ + "tls-certificates=<DIR> Directory containing files for X.509 certificates.\n" \ + "tls-verify=<BOOL> True (default for X.509) to validate server.\n" \ + "tls-username=<NAME> Override username presented in X.509 TLS.\n" \ + "tls-psk=<FILE> File containing Pre-Shared Key for TLS.\n" \ + +static void +nbdplug_dump_plugin (void) +{ + /* XXX libnbd 0.1 doesn't expose a version in libnbd.h */ + printf ("libnbd_version=%s\n", "0.1"); +} #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL @@ -398,6 +452,17 @@ nbdplug_open_handle (int readonly) goto err; if (nbd_request_meta_context (h->nbd, "base:allocation") == -1) goto err; + if (nbd_set_tls (h->nbd, tls) == -1) + goto err; + if (tls_certificates && + nbd_set_tls_certificates (h->nbd, tls_certificates) == -1) + goto err; + if (tls_verify >= 0 && nbd_set_tls_verify_peer (h->nbd, tls_verify) == -1) + goto err; + if (tls_username && nbd_set_tls_username (h->nbd, tls_username) == -1) + goto err; + if (tls_psk && nbd_set_tls_psk_file (h->nbd, tls_psk) == -1) + goto err; if (sockname) r = nbd_connect_unix (h->nbd, sockname); else @@ -749,6 +814,7 @@ static struct nbdkit_plugin plugin = { .config = nbdplug_config, .config_complete = nbdplug_config_complete, .config_help = nbdplug_config_help, + .dump_plugin = nbdplug_dump_plugin, .open = nbdplug_open, .close = nbdplug_close, .get_size = nbdplug_get_size, diff --git a/TODO b/TODO index b9ddb1e..332400b 100644 --- a/TODO +++ b/TODO @@ -90,13 +90,6 @@ qemu-nbd for these use cases. https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02971.html is a partial solution but it needs cleaning up. -nbdkit-nbd-plugin could use enhancements: - -* Enable client-side TLS (right now, the nbd plugin allows us to - support an encrypted client connecting to a plain server; but we - would need TLS to support a plain client connecting to an encrypted - server). - nbdkit-floppy-plugin: * Add boot sector support. In theory this is easy (eg. using @@ -159,11 +152,7 @@ Filters allow certain types of composition, but others would not be possible, for example RAIDing over multiple nbd sources. Because the plugin API limits us to loading a single plugin to the server, the best way to do this (and the most robust) is to compose multiple -nbdkit processes. - -The nbd plugin (plugins/nbd) already contains an NBD client, so we -could factor this client out and make it available to other plugins to -use. +nbdkit processes. Perhaps libnbd will prove useful for this purpose. Build-related ------------- -- 2.20.1
On 5/29/19 10:13 PM, Eric Blake wrote:> Benchmark-wise, using the same setup as in commit e897ed70, I see > an order-of-magnitude slowdown: > > Pre-patch, the runs averaged 1.266s, 1.30E+08 bits/s > Post-patch, the runs averaged 11.663s, 1.41E+07 bits/s > > This will need further profiling to determine how much is nbdkit's > fault, and how much is libnbd's. I think that we are probably holding > some locks for too long, resulting in some serialized performance. > Also, the standalone code was able to run read of command 1 in > parallel with write of command 2 via separate threads, whereas > libnbd's state machine is serializing everything (whether or not the > state machine spreads out the I/O to do writes from the thread calling > nbd_aio_FOO and reads from the reader thread, the two are never run at > once).Rich identified and fixed the culprit - libnbd was not setting TCP_NODELAY (disabling Nagle's algorithm) the way nbd-standalone.c did, which meant that any request that gets split over TCP windowing sizes waits to send the second packet until the ACK for the first has been received by the server. While disabling Nagle's increases network overhead (you are sending more short packets with their overhead rather than bundling things into larger packets by virtue of the downtime waiting for the ACK), it is an artificial delay (the server can't process anything until the packets arrive). libnbd 0.1.2 now disables Nagle's algorithm, and my repeat of the tests with that change showed an improvement to 2.180s, which is more eassily explained by the serialization nature that libnbd is never read()ing from one thread at the same time another thread is write()ing. I'd have to intentionally cripple nbd-standalone.c to do that same serialization, to see if libnbd actually offers an overall performance gain when I/O is serialized (may not be as easy as it sounds; I was relying on blocking I/O from separate threads, but merely serializing that so that no thread is doing I/O concurrent with another risks deadlock in the case of a client sending a large NBD_CMD_READ followed by NBD_CMD_WRITE to a server that responds strictly serially, where the server is waiting for the client to finish the read's response but the client is waiting for the server to parse the write's request; libnbd works around that by using non-blocking sockets and prioritizing reads over writes rather than insisting on complete transactions).> -/* Connect to a TCP socket, returning the fd on success */ > -static int > -nbdplug_connect_tcp (void) > -{ > - struct addrinfo hints = { .ai_family = AF_UNSPEC, > - .ai_socktype = SOCK_STREAM, }; > - struct addrinfo *result, *rp; > - int r; > - const int optval = 1; > - int fd; > -> - > - if (setsockopt (fd, IPPROTO_TCP, TCP_NODELAY, &optval, > - sizeof (int)) == -1) { > - nbdkit_error ("cannot set TCP_NODELAY option: %m"); > - close (fd); > - return -1; > - }-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org