I'm still working on the interleaving (and Rich reminded me on IRC that we still don't have THREAD_MODEL_PARALLEL working anywhere yet, anyways). Since nbdkit doesn't really have a parallel plugin yet, my testing on that front will have to use qemu-nbd as the original server, as well as qemu-io as the driver (qemu-io's aio_read and aio_write commands can be used to trigger interleaved requests, and well-placed gdb breakpoints can make the triggering even easier to test). Do people like the name 'nbd-forward' instead of 'nbd'? The name of the plugin describes what it connects to ('nbdkit file ...' connects to a file; 'nbdkit streaming ...' connects to a streaming fd, so 'nbdkit nbd ...' connects to another nbd server); but the reuse of the term 'nbd' might be a bit confusing, where a longer name would make it more obvious on intended usage. Here's how I tested: term1$ rm -f sock; ./nbdkit -o -U sock [-r] file file=TODO term2$ ./nbdkit [-r] -n nbd socket=sock term3$ qemu-io -f raw nbd://localhost:10809/ [-r] which let me use qemu-io commands to peek (and poke) the TODO file through two layers of nbdkit processes, while upgrading from an old-style handshake to a newstyle (I also used -fv on nbdkit and --trace='nbd_*' on qemu-io to turn up the debugging messages, to actually watch the interactions). Patch 2 shows my thought process to switching things to fully parallel: patch 3/2 (still unwritten at the moment) would basically change a single transaction into a linked list, where nbd_request_full() adds another transaction to the list, nbd_read_raw() uses the cookie received from the server to look up a transaction to remove from the list, and where breaking the while(1) loop of nbd_reader() cleans up any remaining transactions still on the list. Eric Blake (2): nbd: Add new nbd forwarding plugin nbd: Split reading into separate thread configure.ac | 1 + plugins/Makefile.am | 1 + plugins/nbd/Makefile.am | 63 ++++ plugins/nbd/nbd.c | 629 ++++++++++++++++++++++++++++++++++++++ plugins/nbd/nbdkit-nbd-plugin.pod | 96 ++++++ 5 files changed, 790 insertions(+) create mode 100644 plugins/nbd/Makefile.am create mode 100644 plugins/nbd/nbd.c create mode 100644 plugins/nbd/nbdkit-nbd-plugin.pod -- 2.13.6
Eric Blake
2017-Nov-14 19:23 UTC
[Libguestfs] [nbdkit PATCH v2 1/2] nbd: Add new nbd forwarding plugin
This is a minimal implementation of an NBD forwarder; it lets us convert between old and newstyle connections (great if a client expects one style but the real server only provides the other), or add TLS safety on top of a server without having to rewrite that server. Right now, the real server is expected to live on a named Unix socket with no encryption, and the transactions are serialized rather than interleaved; further enhancements could be made to also permit TCP servers, TLS encryption, and/or more efficient transmission. I also envision the possibility of enhancing our testsuite to use NBD forwarding as a great test of our server. Some of the design decisions made in this file are geared towards the future addition of allowing interleaved transmission (for example, the fact that we track a separate 'struct transmission', even though only one exists at a time for now; and the fact that all reading is done through a helper function). Signed-off-by: Eric Blake <eblake@redhat.com> --- configure.ac | 1 + plugins/Makefile.am | 1 + plugins/nbd/Makefile.am | 63 +++++ plugins/nbd/nbd.c | 546 ++++++++++++++++++++++++++++++++++++++ plugins/nbd/nbdkit-nbd-plugin.pod | 96 +++++++ 5 files changed, 707 insertions(+) create mode 100644 plugins/nbd/Makefile.am create mode 100644 plugins/nbd/nbd.c create mode 100644 plugins/nbd/nbdkit-nbd-plugin.pod diff --git a/configure.ac b/configure.ac index 0856f97..86ce5f2 100644 --- a/configure.ac +++ b/configure.ac @@ -453,6 +453,7 @@ AC_CONFIG_FILES([Makefile plugins/guestfs/Makefile plugins/gzip/Makefile plugins/libvirt/Makefile + plugins/nbd/Makefile plugins/ocaml/Makefile plugins/perl/Makefile plugins/python/Makefile diff --git a/plugins/Makefile.am b/plugins/Makefile.am index f8483e6..790fc65 100644 --- a/plugins/Makefile.am +++ b/plugins/Makefile.am @@ -39,6 +39,7 @@ SUBDIRS = \ guestfs \ gzip \ libvirt \ + nbd \ ocaml \ perl \ python \ diff --git a/plugins/nbd/Makefile.am b/plugins/nbd/Makefile.am new file mode 100644 index 0000000..6d105c2 --- /dev/null +++ b/plugins/nbd/Makefile.am @@ -0,0 +1,63 @@ +# nbdkit +# Copyright (C) 2017 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +EXTRA_DIST = nbdkit-nbd-plugin.pod + +CLEANFILES = *~ + +plugindir = $(libdir)/nbdkit/plugins + +plugin_LTLIBRARIES = nbdkit-nbd-plugin.la + +nbdkit_nbd_plugin_la_SOURCES = \ + nbd.c \ + $(top_srcdir)/include/nbdkit-plugin.h + +nbdkit_nbd_plugin_la_CPPFLAGS = \ + -I$(top_srcdir)/include \ + -I$(top_srcdir)/src +nbdkit_nbd_plugin_la_CFLAGS = \ + $(WARNINGS_CFLAGS) +nbdkit_nbd_plugin_la_LDFLAGS = \ + -module -avoid-version -shared + +if HAVE_POD2MAN + +man_MANS = nbdkit-nbd-plugin.1 +CLEANFILES += $(man_MANS) + +nbdkit-nbd-plugin.1: nbdkit-nbd-plugin.pod + $(POD2MAN) $(POD2MAN_ARGS) --section=1 --name=`basename $@ .1` $< $@.t && \ + if grep 'POD ERROR' $@.t; then rm $@.t; exit 1; fi && \ + mv $@.t $@ + +endif diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c new file mode 100644 index 0000000..35f2781 --- /dev/null +++ b/plugins/nbd/nbd.c @@ -0,0 +1,546 @@ +/* nbdkit + * Copyright (C) 2017 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdlib.h> +#include <stddef.h> +#include <stdbool.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> +#include <inttypes.h> +#include <limits.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <assert.h> + +#include <nbdkit-plugin.h> +#include "protocol.h" + +static char *sockname = NULL; +static char *export = NULL; + +static void +nbd_unload (void) +{ + free (sockname); + free (export); +} + +/* Called for each key=value passed on the command line. This plugin + * accepts socket=<sockname> (required for now) and export=<name> (optional). + */ +static int +nbd_config (const char *key, const char *value) +{ + if (strcmp (key, "socket") == 0) { + /* See FILENAMES AND PATHS in nbdkit-plugin(3) */ + free (sockname); + sockname = nbdkit_absolute_path (value); + if (!sockname) + return -1; + } + else if (strcmp (key, "export") == 0) { + free (export); + export = strdup (value); + if (!export) { + nbdkit_error ("memory failure: %m"); + return -1; + } + } + else { + nbdkit_error ("unknown parameter '%s'", key); + return -1; + } + + return 0; +} + +/* Check the user did pass a socket=<SOCKNAME> parameter. */ +static int +nbd_config_complete (void) +{ + struct sockaddr_un sock; + + if (sockname == NULL) { + nbdkit_error ("you must supply the socket=<SOCKNAME> parameter after the plugin name on the command line"); + return -1; + } + if (strlen (sockname) >= sizeof sock.sun_path) { + nbdkit_error ("socket file name too large"); + return -1; + } + if (!export) + export = strdup (""); + if (!export) { + nbdkit_error ("memory failure: %m"); + return -1; + } + return 0; +} + +#define nbd_config_help \ + "socket=<SOCKNAME> (required) The Unix socket to connect to.\n" \ + "export=<NAME> Export name to connect to (default "").\n" \ + +/* TODO Allow more parallelism than one request at a time */ +#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS + +/* The per-transaction details */ +struct transaction { + /* TODO: protocol allows 64-bit handle, but until we allow + interleaved transactions, 31 bits with wraparound is plenty */ + int cookie; + void *buf; + uint32_t count; +}; + +/* The per-connection handle */ +struct handle { + int fd; + int flags; + int64_t size; + /* Our choice of THREAD_MODEL means at most one outstanding transaction */ + struct transaction trans; + bool dead; +}; + +/* Read an entire buffer, returning 0 on success or -1 with errno set. */ +static int +read_full (int fd, void *buf, size_t len) +{ + ssize_t r; + + while (len) { + r = read (fd, buf, len); + if (r < 0) { + if (errno == EINTR || errno == EAGAIN) + continue; + return -1; + } + if (!r) { + /* Unexpected EOF */ + errno = EBADMSG; + return -1; + } + buf += r; + len -= r; + } + return 0; +} + +/* Write an entire buffer, returning 0 on success or -1 with errno set. */ +static int +write_full (int fd, const void *buf, size_t len) +{ + ssize_t r; + + while (len) { + r = write (fd, buf, len); + if (r < 0) { + if (errno == EINTR || errno == EAGAIN) + continue; + return -1; + } + buf += r; + len -= r; + } + return 0; +} + +/* Called during transmission phases when there is no hope of + * resynchronizing with the server, and all further requests from the + * client will fail. Returns -1 for convenience. */ +static int +nbd_mark_dead (struct handle *h) +{ + int err = errno; + + if (!h->dead) { + nbdkit_debug ("permanent failure while talking to server %s: %m", + sockname); + h->dead = true; + } + else if (!err) + errno = ESHUTDOWN; + /* NBD only accepts a limited set of errno values over the wire, and + nbdkit converts all other values to EINVAL. If we died due to an + errno value that cannot transmit over the wire, translate it to + ESHUTDOWN instead. */ + if (err == EPIPE || err == EBADMSG) + nbdkit_set_error (ESHUTDOWN); + return -1; +} + +/* Send a request, return 0 on success or -1 on write failure. */ +static int +nbd_request_raw (struct handle *h, uint32_t type, uint64_t offset, + uint32_t count, uint64_t cookie) +{ + struct request req = { + .magic = htobe32 (NBD_REQUEST_MAGIC), + /* TODO nbdkit should have a way to pass flags, separate from cmd type */ + .type = htobe32 (type), + .handle = cookie, /* Opaque to server, so endianness doesn't matter */ + .offset = htobe64 (offset), + .count = htobe32 (count), + }; + + nbdkit_debug ("sending request with type %d and cookie %#" PRIx64, type, + cookie); + return write_full (h->fd, &req, sizeof req); +} + +/* Perform the request half of a transaction. On success, return the + non-negative cookie to match to the reply; on error return -1. */ +static int +nbd_request_full (struct handle *h, uint32_t type, uint64_t offset, + uint32_t count, const void *req_buf, void *rep_buf) +{ + if (h->dead) + return nbd_mark_dead (h); + h->trans.buf = rep_buf; + h->trans.count = rep_buf ? count : 0; + if (++h->trans.cookie > INT_MAX) + h->trans.cookie = 1; + if (nbd_request_raw (h, type, offset, count, h->trans.cookie) < 0) + return nbd_mark_dead (h); + if (req_buf && write_full (h->fd, req_buf, count) < 0) + return nbd_mark_dead (h); + return h->trans.cookie; +} + +/* Shorthand for nbd_request_full when no extra buffers are involved. */ +static int +nbd_request (struct handle *h, uint32_t type, uint64_t offset, uint32_t count) +{ + return nbd_request_full (h, type, offset, count, NULL, NULL); +} + +/* Read a reply, and look up the corresponding transaction. Return + the server's non-negative answer (converted to local errno value) + on success, or -1 on read failure. */ +static int +nbd_reply_raw (struct handle *h, struct transaction *trans) +{ + struct reply rep; + + if (read_full (h->fd, &rep, sizeof rep) < 0) + return nbd_mark_dead (h); + *trans = h->trans; + nbdkit_debug ("received reply for cookie %#" PRIx64, rep.handle); + if (be32toh (rep.magic) != NBD_REPLY_MAGIC || rep.handle != trans->cookie) + return nbd_mark_dead (h); + switch (be32toh (rep.error)) { + case NBD_SUCCESS: + if (trans->buf && read_full (h->fd, trans->buf, trans->count) < 0) + return nbd_mark_dead (h); + return 0; + case NBD_EPERM: + return EPERM; + case NBD_EIO: + return EIO; + case NBD_ENOMEM: + return ENOMEM; + default: + nbdkit_debug ("unexpected error %d, squashing to EINVAL", + be32toh (rep.error)); + /* fallthrough */ + case NBD_EINVAL: + return EINVAL; + case NBD_ENOSPC: + return ENOSPC; + case NBD_ESHUTDOWN: + /* The server wants us to initiate soft-disconnect. Because our + THREAD_MODEL does not permit interleaved requests, we know that + there are no other pending outstanding messages, so we can + attempt that immediately. + + TODO: Once we allow interleaved requests, handling + soft-disconnect properly will be trickier */ + nbd_request_raw (h, NBD_CMD_DISC, 0, 0, 0); + errno = ESHUTDOWN; + return nbd_mark_dead (h); + } +} + +/* Perform the reply half of a transaction. */ +static int +nbd_reply (struct handle *h, int cookie) +{ + int err; + struct transaction trans; + + err = nbd_reply_raw (h, &trans); + assert (err < 0 || cookie == trans.cookie); + if (err > 0) + errno = err; + return err ? -1 : 0; +} + +/* Create the per-connection handle. */ +static void * +nbd_open (int readonly) +{ + struct handle *h; + struct sockaddr_un sock = { .sun_family = AF_UNIX }; + struct old_handshake old; + uint64_t version; + + h = calloc (1, sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + h->fd = socket (AF_UNIX, SOCK_STREAM, 0); + if (h->fd < 0) { + nbdkit_error ("socket: %m"); + return NULL; + } + strncpy (sock.sun_path, sockname, sizeof (sock.sun_path)); + if (connect (h->fd, (const struct sockaddr *) &sock, sizeof sock) < 0) { + nbdkit_error ("connect: %m"); + goto err; + } + + /* old and new handshake share same meaning of first 16 bytes */ + if (read_full (h->fd, &old, offsetof (struct old_handshake, exportsize))) { + nbdkit_error ("unable to read magic: %m"); + goto err; + } + if (strncmp(old.nbdmagic, "NBDMAGIC", sizeof old.nbdmagic)) { + nbdkit_error ("wrong magic, %s is not an NBD server", sockname); + goto err; + } + version = be64toh (old.version); + if (version == OLD_VERSION) { + if (read_full (h->fd, + (char *) &old + offsetof (struct old_handshake, exportsize), + sizeof old - offsetof (struct old_handshake, exportsize))) { + nbdkit_error ("unable to read old handshake: %m"); + goto err; + } + h->size = be64toh (old.exportsize); + h->flags = be16toh (old.eflags); + } + else if (version == NEW_VERSION) { + uint16_t gflags; + uint32_t cflags; + struct new_option opt; + struct new_handshake_finish finish; + size_t expect; + + if (read_full (h->fd, &gflags, sizeof gflags)) { + nbdkit_error ("unable to read global flags: %m"); + goto err; + } + cflags = htobe32(gflags & (NBD_FLAG_FIXED_NEWSTYLE | NBD_FLAG_NO_ZEROES)); + if (write_full (h->fd, &cflags, sizeof cflags)) { + nbdkit_error ("unable to return global flags: %m"); + goto err; + } + + /* For now, we don't do any option haggling, but go straight into + transmission phase */ + opt.version = htobe64 (NEW_VERSION); + opt.option = htobe32 (NBD_OPT_EXPORT_NAME); + opt.optlen = htobe32 (strlen (export)); + if (write_full (h->fd, &opt, sizeof opt) || + write_full (h->fd, export, strlen (export))) { + nbdkit_error ("unable to request export '%s': %m", export); + goto err; + } + expect = sizeof finish; + if (gflags & NBD_FLAG_NO_ZEROES) + expect -= sizeof finish.zeroes; + if (read_full (h->fd, &finish, expect)) { + nbdkit_error ("unable to read new handshake: %m"); + goto err; + } + h->size = be64toh (finish.exportsize); + h->flags = be16toh (finish.eflags); + } + else { + nbdkit_error ("unexpected version %#" PRIx64, version); + goto err; + } + + return h; + + err: + close (h->fd); + return NULL; +} + +/* Free up the per-connection handle. */ +static void +nbd_close (void *handle) +{ + struct handle *h = handle; + + if (!h->dead) + nbd_request_raw (h, NBD_CMD_DISC, 0, 0, 0); + close (h->fd); + free (h); +} + +/* Get the file size. */ +static int64_t +nbd_get_size (void *handle) +{ + struct handle *h = handle; + + return h->size; +} + +static int +nbd_can_write (void *handle) +{ + struct handle *h = handle; + + return !(h->flags & NBD_FLAG_READ_ONLY); +} + +static int +nbd_can_flush (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_FLUSH; +} + +static int +nbd_is_rotational (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_ROTATIONAL; +} + +static int +nbd_can_trim (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_TRIM; +} + +/* Read data from the file. */ +static int +nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset) +{ + struct handle *h = handle; + int c; + + c = nbd_request_full (h, NBD_CMD_READ, offset, count, NULL, buf); + return c < 0 ? c : nbd_reply (h, c); +} + +/* Write data to the file. */ +static int +nbd_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) +{ + struct handle *h = handle; + int c; + + c = nbd_request_full (h, NBD_CMD_WRITE, offset, count, buf, NULL); + return c < 0 ? c : nbd_reply (h, c); +} + +/* Write zeroes to the file. */ +static int +nbd_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) +{ + struct handle *h = handle; + uint32_t cmd = NBD_CMD_WRITE_ZEROES; + int c; + + if (!(h->flags & NBD_FLAG_SEND_WRITE_ZEROES)) { + /* Trigger a fall back to regular writing */ + errno = EOPNOTSUPP; + return -1; + } + + if (!may_trim) + cmd |= NBD_CMD_FLAG_NO_HOLE; + c = nbd_request (h, cmd, offset, count); + return c < 0 ? c : nbd_reply (h, c); +} + +/* Trim a portion of the file. */ +static int +nbd_trim (void *handle, uint32_t count, uint64_t offset) +{ + struct handle *h = handle; + int c; + + c = nbd_request (h, NBD_CMD_TRIM, offset, count); + return c < 0 ? c : nbd_reply (h, c); +} + +/* Flush the file to disk. */ +static int +nbd_flush (void *handle) +{ + struct handle *h = handle; + int c; + + c = nbd_request (h, NBD_CMD_FLUSH, 0, 0); + return c < 0 ? c : nbd_reply (h, c); +} + +static struct nbdkit_plugin plugin = { + .name = "nbd", + .longname = "nbdkit nbd plugin", + .version = PACKAGE_VERSION, + .unload = nbd_unload, + .config = nbd_config, + .config_complete = nbd_config_complete, + .config_help = nbd_config_help, + .open = nbd_open, + .close = nbd_close, + .get_size = nbd_get_size, + .can_write = nbd_can_write, + .can_flush = nbd_can_flush, + .is_rotational = nbd_is_rotational, + .can_trim = nbd_can_trim, + .pread = nbd_pread, + .pwrite = nbd_pwrite, + .zero = nbd_zero, + .flush = nbd_flush, + .trim = nbd_trim, + .errno_is_preserved = 1, +}; + +NBDKIT_REGISTER_PLUGIN(plugin) diff --git a/plugins/nbd/nbdkit-nbd-plugin.pod b/plugins/nbd/nbdkit-nbd-plugin.pod new file mode 100644 index 0000000..e02e6cc --- /dev/null +++ b/plugins/nbd/nbdkit-nbd-plugin.pod @@ -0,0 +1,96 @@ +=encoding utf8 + +=head1 NAME + +nbdkit-nbd-plugin - nbdkit nbd plugin + +=head1 SYNOPSIS + + nbdkit nbd socket=SOCKNAME [export=NAME] + +=head1 DESCRIPTION + +C<nbdkit-nbd-plugin> is an NBD forwarding plugin for L<nbdkit(1)>. + +It provides an NBD server that forwards all traffic as a client to +another existing NBD server. A primary usage of this setup is to +alter the set of features available to the ultimate end client, +without having to change the original server (for example, to convert +between oldstyle and newtyle, or to add TLS support where the original +server lacks it). + +For now, this is limited to connecting to another NBD server over a +named Unix socket without TLS, although it is feasible that future +additions will support network sockets and encryption. + +=head1 PARAMETERS + +=over 4 + +=item B<socket=SOCKNAME> + +Connect to the NBD server located at the Unix socket C<SOCKNAME>. The +server can speak either new or old style protocol. + +This parameter is required. + +=item B<export=NAME> + +If this parameter is given, and the server speaks new style protocol, +then connect to the named export instead of the default export (the +empty string). + +=back + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-plugin(3)>. + +=head1 AUTHORS + +Eric Blake + +=head1 COPYRIGHT + +Copyright (C) 2017 Red Hat Inc. + +=head1 LICENSE + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +=over 4 + +=item * + +Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +=item * + +Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +=item * + +Neither the name of Red Hat nor the names of its contributors may be +used to endorse or promote products derived from this software without +specific prior written permission. + +=back + +THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE. -- 2.13.6
Eric Blake
2017-Nov-14 19:23 UTC
[Libguestfs] [nbdkit PATCH v2 2/2] nbd: Split reading into separate thread
In preparation for allowing interleaved response, refactor the nbd forwarder so that writes are still done from the thread handling the original request from the client, but all reads are done by a dedicated reader thread. Control between the two flags is gated by a mutex for storing the transaction information, coupled with a pipe for the reader thread to send the final status back to the handler thread. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 100 insertions(+), 17 deletions(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 35f2781..770fb71 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -44,6 +44,7 @@ #include <sys/socket.h> #include <sys/un.h> #include <assert.h> +#include <pthread.h> #include <nbdkit-plugin.h> #include "protocol.h" @@ -119,18 +120,23 @@ nbd_config_complete (void) /* The per-transaction details */ struct transaction { - /* TODO: protocol allows 64-bit handle, but until we allow - interleaved transactions, 31 bits with wraparound is plenty */ - int cookie; + union { + uint64_t cookie; + int fds[2]; + } u; void *buf; uint32_t count; }; /* The per-connection handle */ struct handle { + /* These fields are read-only once initialized */ int fd; int flags; int64_t size; + pthread_t reader; + + pthread_mutex_t lock; /* Covers access to all fields below */ /* Our choice of THREAD_MODEL means at most one outstanding transaction */ struct transaction trans; bool dead; @@ -179,6 +185,18 @@ write_full (int fd, const void *buf, size_t len) return 0; } +static void nbd_lock (struct handle *h) +{ + int r = pthread_mutex_lock (&h->lock); + assert (!r); +} + +static void nbd_unlock (struct handle *h) +{ + int r = pthread_mutex_unlock (&h->lock); + assert (!r); +} + /* Called during transmission phases when there is no hope of * resynchronizing with the server, and all further requests from the * client will fail. Returns -1 for convenience. */ @@ -187,6 +205,7 @@ nbd_mark_dead (struct handle *h) { int err = errno; + nbd_lock (h); if (!h->dead) { nbdkit_debug ("permanent failure while talking to server %s: %m", sockname); @@ -194,6 +213,7 @@ nbd_mark_dead (struct handle *h) } else if (!err) errno = ESHUTDOWN; + nbd_unlock (h); /* NBD only accepts a limited set of errno values over the wire, and nbdkit converts all other values to EINVAL. If we died due to an errno value that cannot transmit over the wire, translate it to @@ -223,22 +243,34 @@ nbd_request_raw (struct handle *h, uint32_t type, uint64_t offset, } /* Perform the request half of a transaction. On success, return the - non-negative cookie to match to the reply; on error return -1. */ + non-negative fd for reading the reply; on error return -1. */ static int nbd_request_full (struct handle *h, uint32_t type, uint64_t offset, uint32_t count, const void *req_buf, void *rep_buf) { + int err; + if (h->dead) return nbd_mark_dead (h); + if (pipe (h->trans.u.fds)) { + nbdkit_error ("unable to create pipe: %m"); + /* Still in sync with server, so don't mark connection dead */ + return -1; + } h->trans.buf = rep_buf; h->trans.count = rep_buf ? count : 0; - if (++h->trans.cookie > INT_MAX) - h->trans.cookie = 1; - if (nbd_request_raw (h, type, offset, count, h->trans.cookie) < 0) - return nbd_mark_dead (h); + if (nbd_request_raw (h, type, offset, count, h->trans.u.cookie) < 0) + goto err; if (req_buf && write_full (h->fd, req_buf, count) < 0) - return nbd_mark_dead (h); - return h->trans.cookie; + goto err; + return h->trans.u.fds[0]; + + err: + err = errno; + close (h->trans.u.fds[0]); + close (h->trans.u.fds[1]); + errno = err; + return nbd_mark_dead (h); } /* Shorthand for nbd_request_full when no extra buffers are involved. */ @@ -258,9 +290,11 @@ nbd_reply_raw (struct handle *h, struct transaction *trans) if (read_full (h->fd, &rep, sizeof rep) < 0) return nbd_mark_dead (h); + nbd_lock (h); *trans = h->trans; + nbd_unlock (h); nbdkit_debug ("received reply for cookie %#" PRIx64, rep.handle); - if (be32toh (rep.magic) != NBD_REPLY_MAGIC || rep.handle != trans->cookie) + if (be32toh (rep.magic) != NBD_REPLY_MAGIC || rep.handle != trans->u.cookie) return nbd_mark_dead (h); switch (be32toh (rep.error)) { case NBD_SUCCESS: @@ -295,17 +329,52 @@ nbd_reply_raw (struct handle *h, struct transaction *trans) } } +/* Reader loop. */ +void *nbd_reader (void *handle) +{ + struct handle *h = handle; + bool done = false; + + while (!done) { + struct transaction trans; + int r; + int fd; + + r = nbd_reply_raw (h, &trans); + fd = trans.u.fds[1]; + trans.u.fds[1] = -1; + if (r >= 0) { + if (write (fd, &r, sizeof r) != sizeof r) { + nbdkit_error ("failed to write pipe: %m"); + abort (); + } + } + close (fd); + nbd_lock (h); + done = h->dead; + nbd_unlock (h); + } + return NULL; +} + /* Perform the reply half of a transaction. */ static int -nbd_reply (struct handle *h, int cookie) +nbd_reply (struct handle *h, int fd) { int err; - struct transaction trans; - err = nbd_reply_raw (h, &trans); - assert (err < 0 || cookie == trans.cookie); - if (err > 0) - errno = err; + if (read (fd, &err, sizeof err) != sizeof err) { + nbdkit_debug ("failed to read pipe: %m"); + err = EIO; + } + nbd_lock (h); + /* TODO This check is just for sanity that the reader thread concept + works; it won't work once we allow interleaved requests */ + assert (fd == h->trans.u.fds[0]); + h->trans.u.fds[0] = -1; + nbd_unlock (h); + close (fd); + errno = err; return err ? -1 : 0; } @@ -396,6 +465,17 @@ nbd_open (int readonly) goto err; } + /* Spawn a dedicated reader thread */ + if ((errno = pthread_mutex_init (&h->lock, NULL))) { + nbdkit_error ("failed to initialize mutex"); + goto err; + } + if ((errno = pthread_create (&h->reader, NULL, nbd_reader, h))) { + nbdkit_error ("failed to initialize reader thread"); + pthread_mutex_destroy (&h->lock); + goto err; + } + return h; err: @@ -412,6 +492,9 @@ nbd_close (void *handle) if (!h->dead) nbd_request_raw (h, NBD_CMD_DISC, 0, 0, 0); close (h->fd); + if ((errno = pthread_join (h->reader, NULL))) + nbdkit_debug ("failed to join reader thread: %m"); + pthread_mutex_destroy (&h->lock); free (h); } -- 2.13.6
Richard W.M. Jones
2017-Nov-16 11:53 UTC
Re: [Libguestfs] [nbdkit PATCH v2 0/2] add nbd plugin
ACK series, with one minor change: Please add a reference to the new manual page in the ‘SEE ALSO’ section at the bottom of the following two files: docs/nbdkit.pod docs/nbdkit-plugin.pod It should be pretty clear what to change if you look at those files. Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Eric Blake
2017-Nov-16 13:59 UTC
Re: [Libguestfs] [nbdkit PATCH v2 1/2] nbd: Add new nbd forwarding plugin
On 11/14/2017 01:23 PM, Eric Blake wrote:> This is a minimal implementation of an NBD forwarder; it lets us > convert between old and newstyle connections (great if a client > expects one style but the real server only provides the other), > or add TLS safety on top of a server without having to rewrite > that server. Right now, the real server is expected to live > on a named Unix socket with no encryption, and the transactions > are serialized rather than interleaved; further enhancements > could be made to also permit TCP servers, TLS encryption, and/or > more efficient transmission. > > I also envision the possibility of enhancing our testsuite to > use NBD forwarding as a great test of our server. > > Some of the design decisions made in this file are geared towards > the future addition of allowing interleaved transmission (for > example, the fact that we track a separate 'struct transmission', > even though only one exists at a time for now; and the fact that > all reading is done through a helper function). > > Signed-off-by: Eric Blake <eblake@redhat.com> > ---> +#define nbd_config_help \ > + "socket=<SOCKNAME> (required) The Unix socket to connect to.\n" \ > + "export=<NAME> Export name to connect to (default "").\n" \That should be (default \"\"), of course. Will squash in when pushing.> +/* Write zeroes to the file. */ > +static int > +nbd_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) > +{ > + struct handle *h = handle; > + uint32_t cmd = NBD_CMD_WRITE_ZEROES; > + int c; > + > + if (!(h->flags & NBD_FLAG_SEND_WRITE_ZEROES)) { > + /* Trigger a fall back to regular writing */ > + errno = EOPNOTSUPP; > + return -1;Potential issue: nbdkit has a max transmission size of 64M, but many other implementations (hello qemu) limit to 32M, so the NBD spec mentions 32M as the portable safe size. If the client sends a write-zeroes request of 128M, the fallback in plugins.c will fragment it into two 64M transactions with actual buffers; which all the other plugins have been able to handle - but now that we are talking to another NBD server that might have a smaller transaction limit, we could kill our connection instead of succeeding. Similarly, if we have a client that can send a 64M read request, but nbdkit forwards on to a server that refuses to reply with that much data, we can get an error where success would have been possible had we fragmented (but at least in that case, we're less likely to kill the connection). We really need to teach nbdkit to use NBD_OPT_GO, which lets servers advertise their maximum transaction size to the client, and then honor that maximum; but in the meantime, I think we probably ought to consider scaling nbdkit's max transmission size down to 32M to match other implementations (either at the connections.c level for ALL plugins, or in nbd_write() in this file for JUST the nbd plugin). Will be a followup patch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Eric Blake
2017-Nov-16 14:02 UTC
Re: [Libguestfs] [nbdkit PATCH v2 2/2] nbd: Split reading into separate thread
On 11/14/2017 01:23 PM, Eric Blake wrote:> In preparation for allowing interleaved response, refactor the > nbd forwarder so that writes are still done from the thread > handling the original request from the client, but all reads are > done by a dedicated reader thread. Control between the two > flags is gated by a mutex for storing the transactions/flags/threads/> information, coupled with a pipe for the reader thread to send > the final status back to the handler thread. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > plugins/nbd/nbd.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 100 insertions(+), 17 deletions(-) >I don't mind pushing 1/2 right away, but I'd like to get my work on PARALLEL support polished and post my 3/2 with full parallel forwarding before I push 2/2. Hopefully later today (last night, I had parallel file plugin nearly working, except for the race during test-socket-activation dying when plugin_name() was called after .unload...) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
On 11/16/2017 05:53 AM, Richard W.M. Jones wrote:> > ACK series, with one minor change: > > Please add a reference to the new manual page in the ‘SEE ALSO’ > section at the bottom of the following two files: > > docs/nbdkit.pod > docs/nbdkit-plugin.pod > > It should be pretty clear what to change if you look at those files.Yep, easy enough to see what to change there. I also just realized that my original use case was for: new client -> IPv4 encrypted -> forwarder -> Unix unencrypted -> old server as a way to allow encrypted network traffic without having to rewrite an old server; but it would also be nice to allow: old client -> Unix unencrypted -> forwarder -> IPv4 encrypted -> new server as a way to add encrypted network traffic without having to rewrite an old client. So it does look like I want to teach the forwarder about TLS connections after all, someday. (The whole idea of a plugin that lets you add arbitrary features on top of another existing NBD endpoint is cool!) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
On 11/16/2017 05:53 AM, Richard W.M. Jones wrote:> > ACK series, with one minor change: > > Please add a reference to the new manual page in the ‘SEE ALSO’ > section at the bottom of the following two files: > > docs/nbdkit.podThis one is obvious; I also added: diff --git i/docs/nbdkit.pod w/docs/nbdkit.pod index 155fcb5..e3043ba 100644 --- i/docs/nbdkit.pod +++ w/docs/nbdkit.pod @@ -463,6 +463,7 @@ Some common clients and the protocol they require: nbd-client < 3.10 client can talk either protocol nbd-client >= 3.10 newstyle any TLS (encrypted) client newstyle + nbdkit nbd plugin client can talk either protocol If you use qemu E<le> 2.5 without the exportname field against a newstyle server, it will give the error:> docs/nbdkit-plugin.podThis one seems to focus more on the cross-language bindings (the SEE ALSO does not mention nbdkit-file-plugin, for example), so I'm not sure if you still want it there. We could always do a followup to expand the list of SEE ALSO to all nbdkit-related pages, but that's more than just nbd, so I left that file unchanged for now. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2017-Nov-17 22:56 UTC
Re: [Libguestfs] [nbdkit PATCH v2 1/2] nbd: Add new nbd forwarding plugin
This one introduces a warning (error because I'm using -Werror): libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -I../../include -I../../src -Wall -Werror -g -O2 -MT nbdkit_nbd_plugin_la-nbd.lo -MD -MP -MF .deps/nbdkit_nbd_plugin_la-nbd.Tpo -c nbd.c -fPIC -DPIC -o .libs/nbdkit_nbd_plugin_la-nbd.o nbd.c: In function ‘nbd_reply’: nbd.c:306:19: error: ‘trans.cookie’ may be used uninitialized in this function [-Werror=maybe-uninitialized] assert (err < 0 || cookie == trans.cookie); ^~ cc1: all warnings being treated as errors make[3]: *** [Makefile:531: nbdkit_nbd_plugin_la-nbd.lo] Error 1 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Apparently Analagous Threads
- [nbdkit PATCH v2 0/4] enable parallel nbd forwarding
- [nbdkit PATCH v2 0/5] Play with libnbd for nbdkit-nbd
- [nbdkit PATCH 0/4] Play with libnbd for nbdkit-add
- [nbdkit PATCH v3 0/5] Play with libnbd for nbdkit-nbd
- [nbdkit PATCH v2 0/5] structured replies/.extents for nbd plugin