Richard W.M. Jones
2020-Mar-17 08:53 UTC
[Libguestfs] [PATCH nbdkit v2] New tmpdisk plugin.
This can be used for creating temporary disks to thin clients, as a kind of "remote tmpfs". See also: https://www.redhat.com/archives/libguestfs/2020-March/msg00134.html --- plugins/data/nbdkit-data-plugin.pod | 1 + plugins/file/nbdkit-file-plugin.pod | 1 + plugins/linuxdisk/nbdkit-linuxdisk-plugin.pod | 7 +- plugins/memory/nbdkit-memory-plugin.pod | 3 +- plugins/tmpdisk/nbdkit-tmpdisk-plugin.pod | 157 +++++++ configure.ac | 2 + plugins/tmpdisk/Makefile.am | 66 +++ tests/Makefile.am | 21 + plugins/tmpdisk/tmpdisk.c | 427 ++++++++++++++++++ tests/test-tmpdisk.c | 157 +++++++ .gitignore | 1 + 11 files changed, 840 insertions(+), 3 deletions(-) diff --git a/plugins/data/nbdkit-data-plugin.pod b/plugins/data/nbdkit-data-plugin.pod index 057392d3..ef8d1e08 100644 --- a/plugins/data/nbdkit-data-plugin.pod +++ b/plugins/data/nbdkit-data-plugin.pod @@ -269,6 +269,7 @@ L<nbdkit-null-plugin(1)>, L<nbdkit-partitioning-plugin(1)>, L<nbdkit-pattern-plugin(1)>, L<nbdkit-random-plugin(1)>, +L<nbdkit-tmpdisk-plugin(1)>, L<nbdkit-zero-plugin(1)>, L<https://github.com/libguestfs/nbdkit/blob/master/plugins/data/disk2data.pl>, L<https://en.wikipedia.org/wiki/Base64>. diff --git a/plugins/file/nbdkit-file-plugin.pod b/plugins/file/nbdkit-file-plugin.pod index d538b127..0c1cfd57 100644 --- a/plugins/file/nbdkit-file-plugin.pod +++ b/plugins/file/nbdkit-file-plugin.pod @@ -111,6 +111,7 @@ L<nbdkit(1)>, L<nbdkit-plugin(3)>, L<nbdkit-split-plugin(1)>, L<nbdkit-partitioning-plugin(1)>, +L<nbdkit-tmpdisk-plugin(1)>, L<nbdkit-noextents-filter(1)>. =head1 AUTHORS diff --git a/plugins/linuxdisk/nbdkit-linuxdisk-plugin.pod b/plugins/linuxdisk/nbdkit-linuxdisk-plugin.pod index 3cac883c..53f4d89d 100644 --- a/plugins/linuxdisk/nbdkit-linuxdisk-plugin.pod +++ b/plugins/linuxdisk/nbdkit-linuxdisk-plugin.pod @@ -24,7 +24,9 @@ symbolic links, block special devices etc. To create a FAT-formatted virtual floppy disk, see L<nbdkit-floppy-plugin(1)>. To create a CD/ISO, see -L<nbdkit-iso-plugin(1)>. +L<nbdkit-iso-plugin(1)>. To create an empty filesystem for each +client that connects (like a "remote tmpfs") use +L<nbdkit-tmpdisk-plugin(1)>. =head1 EXAMPLES @@ -184,7 +186,8 @@ L<nbdkit-file-plugin(1)>, L<nbdkit-floppy-plugin(1)>, L<nbdkit-iso-plugin(1)>, L<nbdkit-partition-filter(1)>, -L<nbdkit-partitioning-plugin(1)>. +L<nbdkit-partitioning-plugin(1)>, +L<nbdkit-tmpdisk-plugin(1)>, =head1 AUTHORS diff --git a/plugins/memory/nbdkit-memory-plugin.pod b/plugins/memory/nbdkit-memory-plugin.pod index ccdc017d..bc565c55 100644 --- a/plugins/memory/nbdkit-memory-plugin.pod +++ b/plugins/memory/nbdkit-memory-plugin.pod @@ -103,7 +103,8 @@ L<nbdkit-plugin(3)>, L<nbdkit-loop(1)>, L<nbdkit-data-plugin(1)>, L<nbdkit-file-plugin(1)>, -L<nbdkit-info-plugin(1)>. +L<nbdkit-info-plugin(1)>, +L<nbdkit-tmpdisk-plugin(1)>. =head1 AUTHORS diff --git a/plugins/tmpdisk/nbdkit-tmpdisk-plugin.pod b/plugins/tmpdisk/nbdkit-tmpdisk-plugin.pod new file mode 100644 index 00000000..925a5091 --- /dev/null +++ b/plugins/tmpdisk/nbdkit-tmpdisk-plugin.pod @@ -0,0 +1,157 @@ +=head1 NAME + +nbdkit-tmpdisk-plugin - create a fresh temporary filesystem for each client + +=head1 SYNOPSIS + + nbdkit tmpdisk [size=]SIZE + [type=ext4|xfs|vfat|...] [label=LABEL] + [command=COMMAND] + +=head1 DESCRIPTION + +This L<nbdkit(1)> plugin is used for creating temporary filesystems +for thin clients. Each time a client connects it will see a fresh, +empty filesystem for its exclusive use. B<The filesystem is deleted> +when the client disconnects. + +When a new client connects, a blank, sparse file of the required size +is created in C<$TMPDIR> (or F</var/tmp>). L<mkfs(8)> is then run on +the file to create the empty filesystem, and this filesystem is served +to the client. Unlike L<nbdkit-linuxdisk-plugin(1)> each client of +this plugin sees a different disk. + +The size of the disk is chosen using the C<size> parameter. The +filesystem type is C<ext4> but this can be changed using the C<type> +parameter (controlling the I<-t> option of mkfs). + +Instead of running mkfs you can run an arbitrary C<command> to create +the disk. + +=head2 Example + +One use for this is to create a kind of "remote L<tmpfs(5)>" for thin +clients. On the server you would run (see L<nbdkit-service(1)>): + + nbdkit tmpdisk 16G + +and each thin client would have a file F</etc/rc.d/rc.local> +containing: + + nm-online + modprobe nbd + nbd-client server /dev/nbd0 + mount /dev/nbd0 /var/scratch + +Clients would see a fresh, empty C</var/scratch> directory after boot. + +=head2 Security considerations + +Because each client gets a new disk, the amount of disk space +required on the server can be as much as +S<<< I<number of clients> × I<size parameter> >>>. It is therefore +best to limit the number of clients using L<nbdkit-limit-filter(1)> or +take steps to limit where clients can connect from using +L<nbdkit-ip-filter(1)>, firewalls, or TLS client certificates. + +=head1 PARAMETERS + +=over 4 + +=item B<command='>COMMANDB<'> + +Instead of running L<mkfs(8)> to create the initial filesystem, run +C<COMMAND> (which usually must be quoted to protect it from the +shell). The following shell variables may be used in C<COMMAND>: + +=over 4 + +=item C<$disk> + +The absolute path of the file that the command must initialize. This +file is created by nbdkit before the command runs. + +=item C<$label> + +The filesystem label (from the C<label> parameter). + +=item C<$size> + +The virtual size in bytes, which is the same as the file size. + +=item C<$type> + +The filesystem type (from the C<type> parameter), defaulting to +C<ext4>. (Commands can ignore this if it is not relevant). + +=back + +=item B<label=>LABEL + +Select the filesystem label. The default is not set. + +=item [B<size=>]SIZE + +Specify the virtual size of the disk image. + +This parameter is required. + +C<size=> is a magic config key and may be omitted in most cases. +See L<nbdkit(1)/Magic parameters>. + +=item B<type=>FS + +Select the filesystem type. The default is C<ext4>. Most +non-networked, non-cluster filesystem types supported by the +L<mkfs(8)> command can be used here. + +=back + +=head1 ENVIRONMENT VARIABLES + +=over 4 + +=item C<TMPDIR> + +The temporary disks for this plugin are created in this directory, one +per connected client. If not set this defaults to F</var/tmp>. + +=back + +=head1 FILES + +=over 4 + +=item F<$plugindir/nbdkit-tmpdisk-plugin.so> + +The plugin. + +Use C<nbdkit --dump-config> to find the location of C<$plugindir>. + +=back + +=head1 VERSION + +C<nbdkit-tmpdisk-plugin> first appeared in nbdkit 1.20. + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-plugin(3)>, +L<nbdkit-data-plugin(1)>, +L<nbdkit-file-plugin(1)>, +L<nbdkit-ip-filter(1)>, +L<nbdkit-limit-filter(1)>, +L<nbdkit-linuxdisk-plugin(1)>, +L<nbdkit-memory-plugin(1)>, +L<nbdkit-loop(1)>, +L<nbdkit-tls(1)>, +L<mkfs(8)>. + +=head1 AUTHORS + +Richard W.M. Jones + +=head1 COPYRIGHT + +Copyright (C) 2018-2020 Red Hat Inc. diff --git a/configure.ac b/configure.ac index dd9ca765..7b5e61e3 100644 --- a/configure.ac +++ b/configure.ac @@ -86,6 +86,7 @@ non_lang_plugins="\ ssh \ streaming \ tar \ + tmpdisk \ vddk \ zero \ " @@ -1016,6 +1017,7 @@ AC_CONFIG_FILES([Makefile plugins/streaming/Makefile plugins/tar/Makefile plugins/tcl/Makefile + plugins/tmpdisk/Makefile plugins/vddk/Makefile plugins/zero/Makefile filters/Makefile diff --git a/plugins/tmpdisk/Makefile.am b/plugins/tmpdisk/Makefile.am new file mode 100644 index 00000000..2e487e92 --- /dev/null +++ b/plugins/tmpdisk/Makefile.am @@ -0,0 +1,66 @@ +# nbdkit +# Copyright (C) 2017-2020 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +include $(top_srcdir)/common-rules.mk + +EXTRA_DIST = nbdkit-tmpdisk-plugin.pod + +plugin_LTLIBRARIES = nbdkit-tmpdisk-plugin.la + +nbdkit_tmpdisk_plugin_la_SOURCES = \ + tmpdisk.c \ + $(top_srcdir)/include/nbdkit-plugin.h \ + $(NULL) + +nbdkit_tmpdisk_plugin_la_CPPFLAGS = \ + -I$(top_srcdir)/include \ + -I$(top_srcdir)/common/utils \ + $(NULL) +nbdkit_tmpdisk_plugin_la_CFLAGS = $(WARNINGS_CFLAGS) +nbdkit_tmpdisk_plugin_la_LDFLAGS = \ + -module -avoid-version -shared \ + -Wl,--version-script=$(top_srcdir)/plugins/plugins.syms \ + $(NULL) +nbdkit_tmpdisk_plugin_la_LIBADD = \ + $(top_builddir)/common/utils/libutils.la \ + $(NULL) + +if HAVE_POD + +man_MANS = nbdkit-tmpdisk-plugin.1 +CLEANFILES += $(man_MANS) + +nbdkit-tmpdisk-plugin.1: nbdkit-tmpdisk-plugin.pod + $(PODWRAPPER) --section=1 --man $@ \ + --html $(top_builddir)/html/$@.html \ + $< + +endif HAVE_POD diff --git a/tests/Makefile.am b/tests/Makefile.am index 65dd148d..17f2c8da 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -766,6 +766,27 @@ test_streaming_SOURCES = test-streaming.c test_streaming_CFLAGS = $(WARNINGS_CFLAGS) $(LIBNBD_CFLAGS) test_streaming_LDADD = $(LIBNBD_LIBS) +# tmpdisk plugin test. +LIBGUESTFS_TESTS += test-tmpdisk + +test_tmpdisk_SOURCES = \ + test-tmpdisk.c \ + test.h \ + $(NULL) +test_tmpdisk_CPPFLAGS = \ + -I$(top_srcdir)/common/utils +test_tmpdisk_CFLAGS = \ + $(WARNINGS_CFLAGS) \ + $(LIBGUESTFS_CFLAGS) \ + $(NULL) +test_tmpdisk_LDFLAGS = \ + $(top_builddir)/common/utils/libutils.la \ + $(NULL) +test_tmpdisk_LDADD = \ + libtest.la \ + $(LIBGUESTFS_LIBS) \ + $(NULL) + if HAVE_VDDK # VDDK plugin test. # This only tests that the plugin can be loaded against a diff --git a/plugins/tmpdisk/tmpdisk.c b/plugins/tmpdisk/tmpdisk.c new file mode 100644 index 00000000..8fba3359 --- /dev/null +++ b/plugins/tmpdisk/tmpdisk.c @@ -0,0 +1,427 @@ +/* nbdkit + * Copyright (C) 2017-2020 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <stdint.h> +#include <inttypes.h> +#include <string.h> +#include <unistd.h> +#include <fcntl.h> +#include <errno.h> +#include <sys/types.h> +#include <sys/wait.h> + +#define NBDKIT_API_VERSION 2 +#include <nbdkit-plugin.h> + +#include "cleanup.h" +#include "utils.h" + +static int64_t size = -1; + +static const char *command + "labelopt='-L'\n" + "case \"$type\" in\n" + " ext?)\n" + " extra='-F' ;;\n" + " *fat|msdos)\n" + " extra='-I' ;;\n" + " ntfs)\n" + " extra='-Q -F'\n" + " labelopt='-n' ;;\n" + " xfs)\n" + " extra='-f' ;;\n" + "esac\n" + "if [ \"x$label\" = \"x\" ]; then\n" + " mkfs -t \"$type\" $extra $disk >/dev/null\n" + "else\n" + " mkfs -t \"$type\" $extra $labelopt \"$label\" $disk >/dev/null\n" + "fi\n"; +static const char *label = NULL; +static const char *type = "ext4"; + +static int +tmpdisk_config (const char *key, const char *value) +{ + if (strcmp (key, "command") == 0) { + command = value; + } + else if (strcmp (key, "label") == 0) { + if (strcmp (value, "") == 0) + label = NULL; + else + label = value; + } + else if (strcmp (key, "size") == 0) { + size = nbdkit_parse_size (value); + if (size == -1) + return -1; + } + else if (strcmp (key, "type") == 0) { + type = value; + } + else { + nbdkit_error ("unknown parameter '%s'", key); + return -1; + } + + return 0; +} + +static int +tmpdisk_config_complete (void) +{ + if (size == -1) { + nbdkit_error ("size parameter is required"); + return -1; + } + + return 0; +} + +#define tmpdisk_config_help \ + "size=<SIZE> (required) Virtual filesystem size.\n" \ + "label=<LABEL> The filesystem label.\n" \ + "type=ext4|... The filesystem type.\n" \ + "command=<COMMAND> Alternate command instead of mkfs." + +struct handle { + int fd; + bool can_punch_hole; +}; + +/* Absolutely unsafe! Although this is simply returning the default + * value, provide this callback to make it clear. + */ +static int +tmpdisk_can_multi_conn (void *handle) +{ + return 0; +} + +static int +tmpdisk_can_trim (void *handle) +{ +#ifdef FALLOC_FL_PUNCH_HOLE + return 1; +#else + return 0; +#endif +} + +/* Pretend we have native FUA support, but actually because all disks + * are temporary we will deliberately ignore flush/FUA operations. + */ +static int +tmpdisk_can_fua (void *handle) +{ + return NBDKIT_FUA_NATIVE; +} + +static int64_t +tmpdisk_get_size (void *handle) +{ + return size; +} + +/* This creates and runs the full "mkfs" (or whatever) command. */ +static int +run_command (const char *disk) +{ + FILE *fp; + CLEANUP_FREE char *cmd = NULL; + size_t len = 0; + int r; + + fp = open_memstream (&cmd, &len); + if (fp == NULL) { + nbdkit_error ("open_memstream: %m"); + return -1; + } + + /* Set the shell variables. */ + fprintf (fp, "disk="); + shell_quote (disk, fp); + putc ('\n', fp); + if (label) { + fprintf (fp, "label="); + shell_quote (label, fp); + putc ('\n', fp); + } + fprintf (fp, "size=%" PRIi64 "\n", size); + fprintf (fp, "type="); + shell_quote (type, fp); + putc ('\n', fp); + + putc ('\n', fp); + fprintf (fp, "%s", command); + + if (fclose (fp) == EOF) { + nbdkit_error ("memstream failed"); + return -1; + } + + r = system (cmd); + if (r == -1) { + nbdkit_error ("failed to execute command: %m"); + return -1; + } + if (WIFEXITED (r) && WEXITSTATUS (r) != 0) { + nbdkit_error ("command exited with code %d", WEXITSTATUS (r)); + return -1; + } + else if (WIFSIGNALED (r)) { + nbdkit_error ("command killed by signal %d", WTERMSIG (r)); + return -1; + } + else if (WIFSTOPPED (r)) { + nbdkit_error ("command stopped by signal %d", WSTOPSIG (r)); + return -1; + } + + return 0; +} + +static void * +tmpdisk_open (int readonly) +{ + struct handle *h; + CLEANUP_FREE char *disk = NULL; + const char *tmpdir; + + tmpdir = getenv ("TMPDIR"); + if (!tmpdir) + tmpdir = "/var/tmp"; + + h = malloc (sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + goto error; + } + h->fd = -1; + h->can_punch_hole = true; + + /* Create the new disk image for this connection. */ + if (asprintf (&disk, "%s/tmpdiskXXXXXX", tmpdir) == -1) { + nbdkit_error ("asprintf: %m"); + goto error; + } + + h->fd = mkstemp (disk); + if (h->fd == -1) { + nbdkit_error ("mkstemp: %m"); + goto error; + } + + /* Truncate the disk to a sparse file of the right size. */ + if (ftruncate (h->fd, size) == -1) { + nbdkit_error ("ftruncate: %s: %m", disk); + goto error; + } + + /* Now run the mkfs command. */ + if (run_command (disk) == -1) + goto error; + + /* We don't need the disk to appear in the filesystem since we hold + * a file descriptor and access it through that, so unlink the disk. + * This also ensures it is always cleaned up. + */ + unlink (disk); + + /* Return the handle. */ + return h; + + error: + if (h) { + if (h->fd >= 0) { + close (h->fd); + unlink (disk); + } + free (h); + } + return NULL; +} + +static void +tmpdisk_close (void *handle) +{ + struct handle *h = handle; + + close (h->fd); + free (h); +} + +/* Read data from the file. */ +static int +tmpdisk_pread (void *handle, void *buf, + uint32_t count, uint64_t offset, + uint32_t flags) +{ + struct handle *h = handle; + + while (count > 0) { + ssize_t r = pread (h->fd, buf, count, offset); + if (r == -1) { + nbdkit_error ("pread: %m"); + return -1; + } + if (r == 0) { + nbdkit_error ("pread: unexpected end of file"); + return -1; + } + buf += r; + count -= r; + offset += r; + } + + return 0; +} + +/* Write data to the file. */ +static int +tmpdisk_pwrite (void *handle, const void *buf, + uint32_t count, uint64_t offset, + uint32_t flags) +{ + struct handle *h = handle; + + while (count > 0) { + ssize_t r = pwrite (h->fd, buf, count, offset); + if (r == -1) { + nbdkit_error ("pwrite: %m"); + return -1; + } + buf += r; + count -= r; + offset += r; + } + + /* Deliberately ignore FUA if present in flags. */ + + return 0; +} + +/* This plugin deliberately provides a null flush operation, because + * all of the disks created are temporary. + */ +static int +tmpdisk_flush (void *handle, uint32_t flags) +{ + return 0; +} + +#if defined (FALLOC_FL_PUNCH_HOLE) +static int +do_fallocate (int fd, int mode, off_t offset, off_t len) +{ + int r = fallocate (fd, mode, offset, len); + if (r == -1 && errno == ENODEV) { + /* kernel 3.10 fails with ENODEV for block device. Kernel >= 4.9 fails + * with EOPNOTSUPP in this case. Normalize errno to simplify callers. + */ + errno = EOPNOTSUPP; + } + return r; +} + +static bool +is_enotsup (int err) +{ + return err == ENOTSUP || err == EOPNOTSUPP; +} +#endif + +/* Punch a hole in the file. */ +static int +tmpdisk_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ +#ifdef FALLOC_FL_PUNCH_HOLE + struct handle *h = handle; + int r; + + if (h->can_punch_hole) { + r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + offset, count); + if (r == -1) { + /* Trim is advisory; we don't care if it fails for anything other + * than EIO or EPERM. + */ + if (errno == EPERM || errno == EIO) { + nbdkit_error ("fallocate: %m"); + return -1; + } + + if (is_enotsup (EOPNOTSUPP)) + h->can_punch_hole = false; + + nbdkit_debug ("ignoring failed fallocate during trim: %m"); + } + } +#endif + + /* Deliberately ignore FUA if present in flags. */ + + return 0; +} + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL + +static struct nbdkit_plugin plugin = { + .name = "tmpdisk", + .version = PACKAGE_VERSION, + + .config = tmpdisk_config, + .config_complete = tmpdisk_config_complete, + .config_help = tmpdisk_config_help, + .magic_config_key = "size", + + .can_multi_conn = tmpdisk_can_multi_conn, + .can_trim = tmpdisk_can_trim, + .can_fua = tmpdisk_can_fua, + .get_size = tmpdisk_get_size, + + .open = tmpdisk_open, + .close = tmpdisk_close, + .pread = tmpdisk_pread, + .pwrite = tmpdisk_pwrite, + .flush = tmpdisk_flush, + .trim = tmpdisk_trim, + + .errno_is_preserved = 1, +}; + +NBDKIT_REGISTER_PLUGIN(plugin) diff --git a/tests/test-tmpdisk.c b/tests/test-tmpdisk.c new file mode 100644 index 00000000..e96f1b82 --- /dev/null +++ b/tests/test-tmpdisk.c @@ -0,0 +1,157 @@ +/* nbdkit + * Copyright (C) 2013-2020 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <stdint.h> +#include <inttypes.h> +#include <string.h> +#include <unistd.h> + +#include <guestfs.h> + +#include "test.h" + +int +main (int argc, char *argv[]) +{ + guestfs_h *g1, *g2; + int r; + char *label; + + /* Start nbdkit. */ + if (test_start_nbdkit ("tmpdisk", "1G", "label=TEST", NULL) == -1) + exit (EXIT_FAILURE); + + /* We can open multiple connections and they should see different + * disks. + */ + g1 = guestfs_create (); + if (g1 == NULL) { + perror ("guestfs_create"); + exit (EXIT_FAILURE); + } + guestfs_set_identifier (g1, "g1"); + + r = guestfs_add_drive_opts (g1, "", + GUESTFS_ADD_DRIVE_OPTS_FORMAT, "raw", + GUESTFS_ADD_DRIVE_OPTS_PROTOCOL, "nbd", + GUESTFS_ADD_DRIVE_OPTS_SERVER, server, + -1); + if (r == -1) + exit (EXIT_FAILURE); + + if (guestfs_launch (g1) == -1) + exit (EXIT_FAILURE); + + g2 = guestfs_create (); + if (g2 == NULL) { + perror ("guestfs_create"); + exit (EXIT_FAILURE); + } + guestfs_set_identifier (g2, "g2"); + + r = guestfs_add_drive_opts (g2, "", + GUESTFS_ADD_DRIVE_OPTS_FORMAT, "raw", + GUESTFS_ADD_DRIVE_OPTS_PROTOCOL, "nbd", + GUESTFS_ADD_DRIVE_OPTS_SERVER, server, + -1); + if (r == -1) + exit (EXIT_FAILURE); + + if (guestfs_launch (g2) == -1) + exit (EXIT_FAILURE); + + /* But they should both see the same filesystem label. */ + label = guestfs_vfs_label (g1, "/dev/sda"); + if (!label) + exit (EXIT_FAILURE); + if (strcmp (label, "TEST") != 0) { + fprintf (stderr, "%s FAILED: unexpected label: %s\n", + program_name, label); + exit (EXIT_FAILURE); + } + free (label); + + label = guestfs_vfs_label (g2, "/dev/sda"); + if (!label) + exit (EXIT_FAILURE); + if (strcmp (label, "TEST") != 0) { + fprintf (stderr, "%s FAILED: unexpected label: %s\n", + program_name, label); + exit (EXIT_FAILURE); + } + free (label); + + /* Mount both disks. */ + if (guestfs_mount (g1, "/dev/sda", "/") == -1) + exit (EXIT_FAILURE); + if (guestfs_mount (g2, "/dev/sda", "/") == -1) + exit (EXIT_FAILURE); + + /* Create some files and directories on each. */ + if (guestfs_mkdir (g1, "/test1") == -1) + exit (EXIT_FAILURE); + if (guestfs_touch (g1, "/test1/file1") == -1) + exit (EXIT_FAILURE); + if (guestfs_mkdir (g2, "/test2") == -1) + exit (EXIT_FAILURE); + if (guestfs_touch (g2, "/test2/file2") == -1) + exit (EXIT_FAILURE); + + if (guestfs_sync (g1) == -1 || guestfs_sync (g2) == -1) + exit (EXIT_FAILURE); + + if (guestfs_is_file (g1, "/test1/file1") != 1) { + fprintf (stderr, "%s FAILED: /test1/file1 is not a file\n", + program_name); + exit (EXIT_FAILURE); + } + if (guestfs_is_file (g2, "/test2/file2") != 1) { + fprintf (stderr, "%s FAILED: /test2/file2 is not a file\n", + program_name); + exit (EXIT_FAILURE); + } + + /* Shut down the connection. */ + if (guestfs_shutdown (g1) == -1) + exit (EXIT_FAILURE); + if (guestfs_shutdown (g2) == -1) + exit (EXIT_FAILURE); + guestfs_close (g1); + guestfs_close (g2); + + exit (EXIT_SUCCESS); +} diff --git a/.gitignore b/.gitignore index ae4aaf3c..ae4e5061 100644 --- a/.gitignore +++ b/.gitignore @@ -134,6 +134,7 @@ plugins/*/*.3 /tests/test-split /tests/test-streaming /tests/test-tcl +/tests/test-tmpdisk /tests/test-xz /tests/test-xz-curl /test-driver -- 2.25.0
On 3/17/20 3:53 AM, Richard W.M. Jones wrote:> This can be used for creating temporary disks to thin clients, as a > kind of "remote tmpfs". > > See also: > https://www.redhat.com/archives/libguestfs/2020-March/msg00134.html > ---> +++ b/plugins/tmpdisk/nbdkit-tmpdisk-plugin.pod > @@ -0,0 +1,157 @@ > +=head1 NAME > + > +nbdkit-tmpdisk-plugin - create a fresh temporary filesystem for each client > + > +=head1 SYNOPSIS > + > + nbdkit tmpdisk [size=]SIZE > + [type=ext4|xfs|vfat|...] [label=LABEL] > + [command=COMMAND] > + > +=head1 DESCRIPTION > + > +This L<nbdkit(1)> plugin is used for creating temporary filesystems > +for thin clients. Each time a client connects it will see a fresh, > +empty filesystem for its exclusive use. B<The filesystem is deleted> > +when the client disconnects. > + > +When a new client connects, a blank, sparse file of the required size > +is created in C<$TMPDIR> (or F</var/tmp>). L<mkfs(8)> is then run on > +the file to create the empty filesystem, and this filesystem is served > +to the client. Unlike L<nbdkit-linuxdisk-plugin(1)> each client of > +this plugin sees a different disk.As you said earlier, we may someday want a mode where the cow filter or similar can provide per-client transience to any plugin, but this plugin does indeed seem useful on its own.> +=head2 Security considerations > + > +Because each client gets a new disk, the amount of disk space > +required on the server can be as much as > +S<<< I<number of clients> × I<size parameter> >>>. It is therefore > +best to limit the number of clients using L<nbdkit-limit-filter(1)> or > +take steps to limit where clients can connect from using > +L<nbdkit-ip-filter(1)>, firewalls, or TLS client certificates.Good advice.> + > +=head1 PARAMETERS > + > +=over 4 > + > +=item B<command='>COMMANDB<'> > + > +Instead of running L<mkfs(8)> to create the initial filesystem, run > +C<COMMAND> (which usually must be quoted to protect it from the > +shell). The following shell variables may be used in C<COMMAND>:As we run COMMAND through /bin/sh, the nbdkit user can inject ANY action they want, but I don't see that as inherently risky as nbdkit is not running with more privileges than the user already has.> +++ b/plugins/tmpdisk/tmpdisk.c > @@ -0,0 +1,427 @@> +static const char *command > + "labelopt='-L'\n" > + "case \"$type\" in\n" > + " ext?)\n" > + " extra='-F' ;;\n" > + " *fat|msdos)\n" > + " extra='-I' ;;\n" > + " ntfs)\n" > + " extra='-Q -F'\n" > + " labelopt='-n' ;;\n" > + " xfs)\n" > + " extra='-f' ;;\n" > + "esac\n" > + "if [ \"x$label\" = \"x\" ]; then\n" > + " mkfs -t \"$type\" $extra $disk >/dev/null\n" > + "else\n" > + " mkfs -t \"$type\" $extra $labelopt \"$label\" $disk >/dev/null\n" > + "fi\n"; > +static const char *label = NULL; > +static const char *type = "ext4";At any rate, it looks like you are mostly safely quoting the default command for invoking mkfs. But you do have a bug: you are using $disk unquoted, which is okay when TMPDIR is undefined but not always, because...[1]> + > +/* Absolutely unsafe! Although this is simply returning the default > + * value, provide this callback to make it clear. > + */ > +static int > +tmpdisk_can_multi_conn (void *handle) > +{ > + return 0;The comment threw me for just a second, until I realized you meant that 'Advertising multi-conn is absolutely unsafe!' and not 'the implementation of this callback is absolutely unsafe'.> +} > + > +static int > +tmpdisk_can_trim (void *handle) > +{ > +#ifdef FALLOC_FL_PUNCH_HOLE > + return 1; > +#else > + return 0; > +#endif > +} > + > +/* Pretend we have native FUA support, but actually because all disks > + * are temporary we will deliberately ignore flush/FUA operations. > + */ > +static int > +tmpdisk_can_fua (void *handle) > +{ > + return NBDKIT_FUA_NATIVE; > +}Agreed.> + > +static int64_t > +tmpdisk_get_size (void *handle) > +{ > + return size; > +} > + > +/* This creates and runs the full "mkfs" (or whatever) command. */ > +static int > +run_command (const char *disk) > +{ > + FILE *fp; > + CLEANUP_FREE char *cmd = NULL; > + size_t len = 0; > + int r; > + > + fp = open_memstream (&cmd, &len); > + if (fp == NULL) { > + nbdkit_error ("open_memstream: %m"); > + return -1; > + } > + > + /* Set the shell variables. */ > + fprintf (fp, "disk="); > + shell_quote (disk, fp);[1]...here you are correctly quoting the definition of disk, and...[2]> + putc ('\n', fp); > + if (label) { > + fprintf (fp, "label="); > + shell_quote (label, fp); > + putc ('\n', fp); > + } > + fprintf (fp, "size=%" PRIi64 "\n", size); > + fprintf (fp, "type="); > + shell_quote (type, fp); > + putc ('\n', fp); > + > + putc ('\n', fp); > + fprintf (fp, "%s", command); > + > + if (fclose (fp) == EOF) { > + nbdkit_error ("memstream failed"); > + return -1; > + } > + > + r = system (cmd);Hmm. We can be invoking multiple system() calls across multiple threads if clients connect in parallel. That means we MUST be sure we don't leak unintended fds...[3] Hmm - what happens under 'nbdkit -s tmpdisk $size'? I suspect the script passed to system() should be changed to start with 'exec </dev/null >/dev/null' to ensure that mkfs or the user command cannot corrupt the NBD client by writing to stdout or confuse nbdkit itself by accidentally consuming from stdin (leaving stderr unchanged should be fine, though). Or is that something that 'nbdkit -s' should do automatically regardless of the plugin (that is, dup() off the original stdin/stdout used to talk to the client, and then changing stdin/stdout to /dev/null prior to .load()ing the plugin)? And if we change nbdkit itself for -s, should we also make sure that there is no cross-process interaction between the fds used by --run and those handed to the captive child nbdkit process?> + if (r == -1) { > + nbdkit_error ("failed to execute command: %m"); > + return -1; > + } > + if (WIFEXITED (r) && WEXITSTATUS (r) != 0) { > + nbdkit_error ("command exited with code %d", WEXITSTATUS (r)); > + return -1; > + } > + else if (WIFSIGNALED (r)) { > + nbdkit_error ("command killed by signal %d", WTERMSIG (r)); > + return -1; > + } > + else if (WIFSTOPPED (r)) { > + nbdkit_error ("command stopped by signal %d", WSTOPSIG (r)); > + return -1; > + }Generally, WIFSTOPPED() should be unreachable for the result of system(). I don't mind if you leave it for ease of copy/paste, but it doesn't hurt to drop it.> + > + return 0; > +} > + > +static void * > +tmpdisk_open (int readonly) > +{ > + struct handle *h; > + CLEANUP_FREE char *disk = NULL; > + const char *tmpdir; > + > + tmpdir = getenv ("TMPDIR"); > + if (!tmpdir) > + tmpdir = "/var/tmp";Rather than calling getenv() for every client, should we pre-populate a file-scope variable once during .get_ready? But since we have to malloc() per client, that micro-optimization is probably in the noise.> + > + h = malloc (sizeof *h); > + if (h == NULL) { > + nbdkit_error ("malloc: %m"); > + goto error; > + } > + h->fd = -1; > + h->can_punch_hole = true; > + > + /* Create the new disk image for this connection. */ > + if (asprintf (&disk, "%s/tmpdiskXXXXXX", tmpdir) == -1) {[2]...here, a user-supplied TMPDIR may require shell-quoting at the points above.> + nbdkit_error ("asprintf: %m"); > + goto error; > + } > + > + h->fd = mkstemp (disk);[3]...ouch - we are not using mkostemp() to atomically set FD_CLOEXEC, nor even mkstemp/fcntl() to set it after the fact due to Haiku still lacking mkostemp(). Note that in filters/cow/cow.c, we don't worry about a mutex when falling back due to missing mkostemp(), but that was done in .load, not .open. Here, Haiku would definitely need a mutex - but we would also need to make sure that such a mutex only guards the mkstemp/fcntl pair and not the system() call (there is no reason a second client must wait for the mkfs of the first client to finish). Or is mkfs specific enough to Linux that no one would ever need to build this plugin on Haiku? At any rate, without FD_CLOEXEC, and allowing parallel system() calls, we have to worry about the consequence of an fd leak: I think that mkfs is generally well-behaved at ignoring unknown fds, but if it is long-running, this could tie up the disk space used by a fast-exiting first client for longer than necessary while waiting for mkfs to finish for the second client. And we have no guarantee that a user-specified command is as well-behaved as mkfs about not interfering with unknown leaked fds.> + if (h->fd == -1) { > + nbdkit_error ("mkstemp: %m"); > + goto error; > + } > + > + /* Truncate the disk to a sparse file of the right size. */ > + if (ftruncate (h->fd, size) == -1) { > + nbdkit_error ("ftruncate: %s: %m", disk); > + goto error; > + } > + > + /* Now run the mkfs command. */ > + if (run_command (disk) == -1) > + goto error; > + > + /* We don't need the disk to appear in the filesystem since we hold > + * a file descriptor and access it through that, so unlink the disk. > + * This also ensures it is always cleaned up. > + */ > + unlink (disk);I like this aspect. It does mean that 'du' won't find the space hogs under $TMPDIR (lsof or similar will be needed), but I don't see that as a show-stopper.> +/* Write data to the file. */ > +static int > +tmpdisk_pwrite (void *handle, const void *buf, > + uint32_t count, uint64_t offset, > + uint32_t flags) > +{ > + struct handle *h = handle; > + > + while (count > 0) { > + ssize_t r = pwrite (h->fd, buf, count, offset); > + if (r == -1) { > + nbdkit_error ("pwrite: %m"); > + return -1; > + } > + buf += r; > + count -= r; > + offset += r; > + } > + > + /* Deliberately ignore FUA if present in flags. */Makes sense.> + > + return 0; > +} > + > +/* This plugin deliberately provides a null flush operation, because > + * all of the disks created are temporary. > + */ > +static int > +tmpdisk_flush (void *handle, uint32_t flags) > +{ > + return 0;As does this.> +/* Punch a hole in the file. */ > +static int > +tmpdisk_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) > +{ > +#ifdef FALLOC_FL_PUNCH_HOLE > + struct handle *h = handle; > + int r; > + > + if (h->can_punch_hole) { > + r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, > + offset, count);Hmm - when this succeeds, we guarantee that we read back as zeroes. Is it worth also implementing .zero for this plugin, to take advantage of fallocate when flags permits zeroing by trimming, rather than having to always rely on nbdkit's emulation through .pwrite?> + > + .open = tmpdisk_open, > + .close = tmpdisk_close, > + .pread = tmpdisk_pread, > + .pwrite = tmpdisk_pwrite, > + .flush = tmpdisk_flush, > + .trim = tmpdisk_trim, > + > + .errno_is_preserved = 1, > +}; > +-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Mar-17 11:58 UTC
Re: [Libguestfs] [PATCH nbdkit v2] New tmpdisk plugin.
On Tue, Mar 17, 2020 at 06:16:49AM -0500, Eric Blake wrote:> >+static void * > >+tmpdisk_open (int readonly) > >+{ > >+ struct handle *h; > >+ CLEANUP_FREE char *disk = NULL; > >+ const char *tmpdir; > >+ > >+ tmpdir = getenv ("TMPDIR"); > >+ if (!tmpdir) > >+ tmpdir = "/var/tmp"; > > Rather than calling getenv() for every client, should we > pre-populate a file-scope variable once during .get_ready?Is there a particular reason to do it in .get_ready instead of .load? In v3 I did it in .load Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Maybe Matching Threads
- [PATCH nbdkit v2] New tmpdisk plugin.
- [PATCH nbdkit v2 3/3] tmpdisk: Implement this plugin using fileops.
- [PATCH nbdkit v2] tmpdisk: Pass any parameters as shell variables to the command.
- [PATCH nbdkit v3] tmpdisk: Pass any parameters as shell variables to the command.
- [PATCH nbdkit 2/2] tmpdisk: Pass any parameters as shell variables to the command.