I still need to add testsuite coverage. Perhaps it might be easier if I create a new '--filter=log logfile=foo' filter that produces a log of which commands a client sent, then compare the log using a known client that uses write_zeroes (qemu-io works well) both with and without --filter=nozero to prove that the change in advertisement changes the commands sent over the wire (that would also test the log filter at the same time...). I'll wait for a review on this series, particularly since patch 2 intentionally breaks filter ABI (more ABI/API breaks on the way when I resubmit my FUA work; it would be nice to make all the API breaks before v1.1.29, rather than releasing versions while things are still in flux). Note that even if we try hard to let newer nbdkit load an older filter (which I did not do here), we absolutely cannot let older nbdkit load a newer filter - while we have sizing limits to guarantee we don't call beyond the end of an older filter's smaller struct, there are no sizing limits in the other direction to prevent a newer filter to know that it cannot call beyond the end of our smaller nbdkit_next_ops. Eric Blake (3): connections: Don't advertise TRIM on readonly connection filter: Add .can_zero/.can_fua overrides filters: Add nozero filter TODO | 2 +- configure.ac | 3 +- docs/nbdkit-filter.pod | 30 +++++++++++- docs/nbdkit.pod | 1 + filters/Makefile.am | 1 + filters/nozero/Makefile.am | 62 ++++++++++++++++++++++++ filters/nozero/nbdkit-nozero-filter.pod | 84 +++++++++++++++++++++++++++++++++ filters/nozero/nozero.c | 68 ++++++++++++++++++++++++++ include/nbdkit-filter.h | 8 +++- src/connections.c | 45 +++++++++++++----- src/filters.c | 70 ++++++++++++++++++++++++--- src/internal.h | 2 + src/plugins.c | 22 +++++++++ 13 files changed, 376 insertions(+), 22 deletions(-) create mode 100644 filters/nozero/Makefile.am create mode 100644 filters/nozero/nbdkit-nozero-filter.pod create mode 100644 filters/nozero/nozero.c -- 2.14.3
Eric Blake
2018-Jan-24 04:10 UTC
[Libguestfs] [nbdkit PATCH 1/3] connections: Don't advertise TRIM on readonly connection
A client cannot attempt TRIM on a read-only connection (we already filter it in connections.c with EROFS); as such, it makes no sense to query the plugin's .can_trim nor to advertise TRIM support to the client on such a connection, similar to how we do not advertise WRITE_ZEROES. FUA only has defined meanings on write operations, but the NBD protocol allows us to advertise it on reads and to ignore clients that send it on read (older qemu did so), so we can continue to advertise it regardless of readonly state. Likewise, while flush is unlikely to make a difference on a read-only connection, there is no reason why it shouldn't be advertised. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/connections.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/connections.c b/src/connections.c index 8e110a5..75c2c2d 100644 --- a/src/connections.c +++ b/src/connections.c @@ -427,6 +427,14 @@ compute_eflags (struct connection *conn, uint16_t *flags) } if (!conn->readonly) { eflags |= NBD_FLAG_SEND_WRITE_ZEROES; + + fl = backend->can_trim (backend, conn); + if (fl == -1) + return -1; + if (fl) { + eflags |= NBD_FLAG_SEND_TRIM; + conn->can_trim = 1; + } } fl = backend->can_flush (backend, conn); @@ -445,14 +453,6 @@ compute_eflags (struct connection *conn, uint16_t *flags) conn->is_rotational = 1; } - fl = backend->can_trim (backend, conn); - if (fl == -1) - return -1; - if (fl) { - eflags |= NBD_FLAG_SEND_TRIM; - conn->can_trim = 1; - } - *flags = eflags; return 0; } -- 2.14.3
Eric Blake
2018-Jan-24 04:10 UTC
[Libguestfs] [nbdkit PATCH 2/3] filter: Add .can_zero/.can_fua overrides
While our plugin code always advertises WRITE_ZEROES on writable connections (because we can emulate .zero by calling .pwrite), and FUA support when .flush exists (because we can emulate the persistence by calling .flush), it is conceivable that a filter may want to explicitly avoid advertising particular bits. (More to the point, I plan on writing a 'nozero' filter that hides WRITE_ZEROES support, at least for the purposes of timing comparisons between server emulation and the client having to send zeroes over the wire). Furthermore, since we only have to honor FUA support on write requests, we do not need to advertise it on a readonly connection. Wire up two new backend callbacks, as well as exposing them to the filter interface (for now, we intentionally do not expose .can_zero to the plugin interface, and later patches will worry about exposing .can_fua as part of updating the plugin interface to support flags). This change breaks filter API; a filter compiled against an earlier nbdkit will not run against this version of nbdkit. We haven't promised ABI stability yet, in part because a filter compiled against an older nbdkit needs to be aware of any new entry points (and other upcoming changes, such as adding FUA flag support); conversely, a filter compiled against a newer nbdkit is liable to try to use new pointers in nbdkit_next_ops that are not part of the current nbdkit, and while we have _struct_size to protect what we call in the filter, we do not have a similar mechanism to protect what the filter can call back in our code. As such, this patch can keep the public interface logically grouped by inserting members in the middle of the struct, rather than worrying about having to stick additions at the end. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/nbdkit-filter.pod | 21 ++++++++++++++- include/nbdkit-filter.h | 8 +++++- src/connections.c | 29 +++++++++++++++++--- src/filters.c | 70 ++++++++++++++++++++++++++++++++++++++++++++----- src/internal.h | 2 ++ src/plugins.c | 22 ++++++++++++++++ 6 files changed, 140 insertions(+), 12 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index be3c7a0..dbd4385 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -338,7 +338,26 @@ should call C<nbdkit_error> with an error message and return C<-1>. int (*can_trim) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); -These intercept the corresponding plugin methods. +These intercept the corresponding plugin methods, and control feature +bits advertised to the client. + +If there is an error, the callback should call C<nbdkit_error> with an +error message and return C<-1>. + +=head2 C<.can_zero> + +=head2 C<.can_fua> + + int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); + int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); + +These control feature bits that are advertised to the client. These +functions have no counterpart in plugins, because nbdkit can always +emulate zero by using pwrite and emulate Forced Unit Access (FUA) by +using flush; but a filter can implement these to force a bypass of the +nbdkit fallbacks. If there is an error, the callback should call C<nbdkit_error> with an error message and return C<-1>. diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 8ea9919..2ff2902 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2017 Red Hat Inc. + * Copyright (C) 2013-2018 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -57,6 +57,8 @@ struct nbdkit_next_ops { int (*can_flush) (void *nxdata); int (*is_rotational) (void *nxdata); int (*can_trim) (void *nxdata); + int (*can_zero) (void *nxdata); + int (*can_fua) (void *nxdata); int (*pread) (void *nxdata, void *buf, uint32_t count, uint64_t offset); int (*pwrite) (void *nxdata, @@ -111,6 +113,10 @@ struct nbdkit_filter { void *handle); int (*can_trim) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); + int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*pread) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, uint32_t count, uint64_t offset); diff --git a/src/connections.c b/src/connections.c index 75c2c2d..5a7b2d8 100644 --- a/src/connections.c +++ b/src/connections.c @@ -80,6 +80,8 @@ struct connection { int can_flush; int is_rotational; int can_trim; + int can_zero; + int can_fua; int using_tls; int sockin, sockout; @@ -426,7 +428,13 @@ compute_eflags (struct connection *conn, uint16_t *flags) conn->readonly = 1; } if (!conn->readonly) { - eflags |= NBD_FLAG_SEND_WRITE_ZEROES; + fl = backend->can_zero (backend, conn); + if (fl == -1) + return -1; + if (fl) { + eflags |= NBD_FLAG_SEND_WRITE_ZEROES; + conn->can_zero = 1; + } fl = backend->can_trim (backend, conn); if (fl == -1) @@ -435,13 +443,21 @@ compute_eflags (struct connection *conn, uint16_t *flags) eflags |= NBD_FLAG_SEND_TRIM; conn->can_trim = 1; } + + fl = backend->can_fua (backend, conn); + if (fl == -1) + return -1; + if (fl) { + eflags |= NBD_FLAG_SEND_FUA; + conn->can_fua = 1; + } } fl = backend->can_flush (backend, conn); if (fl == -1) return -1; if (fl) { - eflags |= NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA; + eflags |= NBD_FLAG_SEND_FLUSH; conn->can_flush = 1; } @@ -880,7 +896,7 @@ validate_request (struct connection *conn, *error = EINVAL; return false; } - if (!conn->can_flush && (flags & NBD_CMD_FLAG_FUA)) { + if (!conn->can_fua && (flags & NBD_CMD_FLAG_FUA)) { nbdkit_error ("invalid request: FUA flag not supported"); *error = EINVAL; return false; @@ -909,6 +925,13 @@ validate_request (struct connection *conn, return false; } + /* Zero allowed? */ + if (!conn->can_zero && cmd == NBD_CMD_WRITE_ZEROES) { + nbdkit_error ("invalid request: write zeroes operation not supported"); + *error = EINVAL; + return false; + } + return true; /* Command validates. */ } diff --git a/src/filters.c b/src/filters.c index 2804cba..87c6e97 100644 --- a/src/filters.c +++ b/src/filters.c @@ -297,6 +297,20 @@ next_can_trim (void *nxdata) return b_conn->b->can_trim (b_conn->b, b_conn->conn); } +static int +next_can_zero (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_zero (b_conn->b, b_conn->conn); +} + +static int +next_can_fua (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_fua (b_conn->b, b_conn->conn); +} + static int next_pread (void *nxdata, void *buf, uint32_t count, uint64_t offset) { @@ -348,6 +362,8 @@ static struct nbdkit_next_ops next_ops = { .flush = next_flush, .trim = next_trim, .zero = next_zero, + .can_zero = next_can_zero, + .can_fua = next_can_fua, }; static int @@ -455,6 +471,36 @@ filter_can_trim (struct backend *b, struct connection *conn) return f->backend.next->can_trim (f->backend.next, conn); } +static int +filter_can_zero (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + debug ("can_zero"); + + if (f->filter.can_zero) + return f->filter.can_zero (&next_ops, &nxdata, handle); + else + return f->backend.next->can_zero (f->backend.next, conn); +} + +static int +filter_can_fua (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + debug ("can_fua"); + + if (f->filter.can_fua) + return f->filter.can_fua (&next_ops, &nxdata, handle); + else + return f->backend.next->can_fua (f->backend.next, conn); +} + static int filter_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, @@ -582,6 +628,8 @@ static struct backend filter_functions = { .flush = filter_flush, .trim = filter_trim, .zero = filter_zero, + .can_zero = filter_can_zero, + .can_fua = filter_can_fua, }; /* Register and load a filter. */ @@ -626,15 +674,23 @@ filter_register (struct backend *next, size_t index, const char *filename, exit (EXIT_FAILURE); } - /* Since the filter might be much older than the current version of - * nbdkit, only copy up to the self-declared _struct_size of the - * filter and zero out the rest. If the filter is much newer then - * we'll only call the "old" fields. + /* Providing a stable filter ABI is much harder than a stable plugin + * ABI. Any newer filter compiled against a larger struct + * nbdkit_next_ops could potentially dereference fields we did not + * provide; and any older filter compiled against a smaller struct + * nbdkit_filter may miss intercepting functions that are vital to + * avoid corrupting the client-visible data. So for now, we insist + * on an exact match in _struct_size, and promise that + * nbdkit_next_ops won't change size without also changing + * _struct_size. */ size = sizeof f->filter; /* our struct */ - memset (&f->filter, 0, size); - if (size > filter->_struct_size) - size = filter->_struct_size; + if (filter->_struct_size != size) { + fprintf (stderr, + "%s: %s: filter compiled against incompatible nbdkit version\n", + program_name, f->filename); + exit (EXIT_FAILURE); + } memcpy (&f->filter, filter, size); /* Only filter.name is required. */ diff --git a/src/internal.h b/src/internal.h index 3cbfde5..2f03279 100644 --- a/src/internal.h +++ b/src/internal.h @@ -191,6 +191,8 @@ struct backend { int (*flush) (struct backend *, struct connection *conn, uint32_t flags); int (*trim) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags); int (*zero) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags); + int (*can_zero) (struct backend *, struct connection *conn); + int (*can_fua) (struct backend *, struct connection *conn); }; /* plugins.c */ diff --git a/src/plugins.c b/src/plugins.c index dba3e24..3468b6d 100644 --- a/src/plugins.c +++ b/src/plugins.c @@ -358,6 +358,26 @@ plugin_can_trim (struct backend *b, struct connection *conn) return p->plugin.trim != NULL; } +static int +plugin_can_zero (struct backend *b, struct connection *conn) +{ + debug ("can_zero"); + + // We always allow .zero to fall back to .write, so plugins don't + // need to override this + return plugin_can_write (b, conn); +} + +static int +plugin_can_fua (struct backend *b, struct connection *conn) +{ + debug ("can_fua"); + + // TODO - wire FUA flag support into plugins. Until then, this copies + // can_flush, since that's how we emulate FUA + return plugin_can_flush (b, conn); +} + static int plugin_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, uint32_t flags) @@ -535,6 +555,8 @@ static struct backend plugin_functions = { .flush = plugin_flush, .trim = plugin_trim, .zero = plugin_zero, + .can_zero = plugin_can_zero, + .can_fua = plugin_can_fua, }; /* Register and load a plugin. */ -- 2.14.3
Eric Blake
2018-Jan-24 04:10 UTC
[Libguestfs] [nbdkit PATCH 3/3] filters: Add nozero filter
Sometimes, it's nice to see what a difference it makes in timing or in destination file size when sparse handling is enabled or disabled. Add a new filter that makes it trivial to disable write zero support. Signed-off-by: Eric Blake <eblake@redhat.com> --- TODO | 2 +- configure.ac | 3 +- docs/nbdkit-filter.pod | 9 ++++ docs/nbdkit.pod | 1 + filters/Makefile.am | 1 + filters/nozero/Makefile.am | 62 ++++++++++++++++++++++++ filters/nozero/nbdkit-nozero-filter.pod | 84 +++++++++++++++++++++++++++++++++ filters/nozero/nozero.c | 68 ++++++++++++++++++++++++++ 8 files changed, 228 insertions(+), 2 deletions(-) create mode 100644 filters/nozero/Makefile.am create mode 100644 filters/nozero/nbdkit-nozero-filter.pod create mode 100644 filters/nozero/nozero.c diff --git a/TODO b/TODO index f6ab20c..eb11fc0 100644 --- a/TODO +++ b/TODO @@ -65,7 +65,7 @@ Suggestions for filters ----------------------- * injecting artificial errors or otherwise masking plugin features - (such as hiding zero support) for testing clients + for testing clients (see 'nozero' filter for example) * logging all client commands diff --git a/configure.ac b/configure.ac index 6025ce0..dc7fc76 100644 --- a/configure.ac +++ b/configure.ac @@ -1,5 +1,5 @@ # nbdkit -# Copyright (C) 2013-2017 Red Hat Inc. +# Copyright (C) 2013-2018 Red Hat Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -516,6 +516,7 @@ AC_CONFIG_FILES([Makefile filters/cache/Makefile filters/cow/Makefile filters/delay/Makefile + filters/nozero/Makefile filters/offset/Makefile filters/partition/Makefile src/Makefile diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index dbd4385..d0d9042 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -487,6 +487,15 @@ You can also run pkg-config/pkgconf directly, for example: L<nbdkit(1)>, L<nbdkit-plugin(1)>. +Filters: + +L<nbdkit-cache-filter(1)>, +L<nbdkit-cow-filter(1)>, +L<nbdkit-delay-filter(1)>, +L<nbdkit-nozero-filter(1)>, +L<nbdkit-offset-filter(1)>, +L<nbdkit-partition-filter(1)>. + =head1 AUTHORS Richard W.M. Jones diff --git a/docs/nbdkit.pod b/docs/nbdkit.pod index 1167245..d66bf7c 100644 --- a/docs/nbdkit.pod +++ b/docs/nbdkit.pod @@ -920,6 +920,7 @@ Filters: L<nbdkit-cache-filter(1)>, L<nbdkit-cow-filter(1)>, L<nbdkit-delay-filter(1)>, +L<nbdkit-nozero-filter(1)>, L<nbdkit-offset-filter(1)>, L<nbdkit-partition-filter(1)>. diff --git a/filters/Makefile.am b/filters/Makefile.am index 9996d77..0a3ec0e 100644 --- a/filters/Makefile.am +++ b/filters/Makefile.am @@ -34,5 +34,6 @@ SUBDIRS = \ cache \ cow \ delay \ + nozero \ offset \ partition diff --git a/filters/nozero/Makefile.am b/filters/nozero/Makefile.am new file mode 100644 index 0000000..1c66ed8 --- /dev/null +++ b/filters/nozero/Makefile.am @@ -0,0 +1,62 @@ +# nbdkit +# Copyright (C) 2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +EXTRA_DIST = nbdkit-nozero-filter.pod + +CLEANFILES = *~ + +filterdir = $(libdir)/nbdkit/filters + +filter_LTLIBRARIES = nbdkit-nozero-filter.la + +nbdkit_nozero_filter_la_SOURCES = \ + nozero.c \ + $(top_srcdir)/include/nbdkit-filter.h + +nbdkit_nozero_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include +nbdkit_nozero_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) +nbdkit_nozero_filter_la_LDFLAGS = \ + -module -avoid-version -shared + +if HAVE_POD2MAN + +man_MANS = nbdkit-nozero-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-nozero-filter.1: nbdkit-nozero-filter.pod + $(POD2MAN) $(POD2MAN_ARGS) --section=1 --name=`basename $@ .1` $< $@.t && \ + if grep 'POD ERROR' $@.t; then rm $@.t; exit 1; fi && \ + mv $@.t $@ + +endif diff --git a/filters/nozero/nbdkit-nozero-filter.pod b/filters/nozero/nbdkit-nozero-filter.pod new file mode 100644 index 0000000..dd89767 --- /dev/null +++ b/filters/nozero/nbdkit-nozero-filter.pod @@ -0,0 +1,84 @@ +=encoding utf8 + +=head1 NAME + +nbdkit-nozero-filter - nbdkit nozero filter + +=head1 SYNOPSIS + + nbdkit --filter=nozero plugin [plugin-args...] + +=head1 DESCRIPTION + +C<nbdkit-nozero-filter> is a filter that intentionally disables +efficient handling of sparse file holes (ranges of all-zero bytes) +across the NBD protocol. It is mainly useful for evaluating timing +differences between naive vs. sparse-aware connections, and for +testing client or server fallbacks. + +=head1 PARAMETERS + +There are no parameters specific to nbdkit-nozero-filter. Any +parameters are passed through to and processed by the underlying +plugin in the normal way. + +=head1 EXAMPLES + +Serve the file F<disk.img>, but force the client to write zeroes +explicitly rather than with C<NBD_CMD_WRITE_ZEROES>: + + nbdkit --filter=nozero file file=disk.img + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-file-plugin(1)>, +L<nbdkit-filter(3)>. + +=head1 AUTHORS + +Eric Blake + +=head1 COPYRIGHT + +Copyright (C) 2018 Red Hat Inc. + +=head1 LICENSE + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +=over 4 + +=item * + +Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +=item * + +Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +=item * + +Neither the name of Red Hat nor the names of its contributors may be +used to endorse or promote products derived from this software without +specific prior written permission. + +=back + +THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE. diff --git a/filters/nozero/nozero.c b/filters/nozero/nozero.c new file mode 100644 index 0000000..676164e --- /dev/null +++ b/filters/nozero/nozero.c @@ -0,0 +1,68 @@ +/* nbdkit + * Copyright (C) 2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> + +#include <nbdkit-filter.h> + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL + +/* Disable WRITE_ZEROES. */ +static int +nozero_can_zero (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + return 0; +} + +/* Should never be reached (nbdkit should have already flagged a bad client). */ +static int +nozero_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, int may_trim) +{ + abort (); +} + +static struct nbdkit_filter filter = { + .name = "nozero", + .longname = "nbdkit nozero filter", + .version = PACKAGE_VERSION, + .can_zero = nozero_can_zero, + .zero = nozero_zero, +}; + +NBDKIT_REGISTER_FILTER(filter) -- 2.14.3
Richard W.M. Jones
2018-Jan-24 14:18 UTC
Re: [Libguestfs] [nbdkit PATCH 0/3] Add nozero filter
On Tue, Jan 23, 2018 at 10:10:12PM -0600, Eric Blake wrote:> I still need to add testsuite coverage. Perhaps it might be easier > if I create a new '--filter=log logfile=foo' filter that produces > a log of which commands a client sent, then compare the log using > a known client that uses write_zeroes (qemu-io works well) both > with and without --filter=nozero to prove that the change in > advertisement changes the commands sent over the wire (that would > also test the log filter at the same time...). > > I'll wait for a review on this series, particularly since patch 2 > intentionally breaks filter ABI (more ABI/API breaks on the way > when I resubmit my FUA work; it would be nice to make all the API > breaks before v1.1.29, rather than releasing versions while things > are still in flux). > > Note that even if we try hard to let newer nbdkit load an older > filter (which I did not do here), we absolutely cannot let older > nbdkit load a newer filter - while we have sizing limits to > guarantee we don't call beyond the end of an older filter's > smaller struct, there are no sizing limits in the other > direction to prevent a newer filter to know that it cannot call > beyond the end of our smaller nbdkit_next_ops.Hmm, yes, this is a good point that I didn't think about. Probably we're going to have to forget about the filter._api_version and filter._struct_size fields and instead encode the version of nbdkit it was compiled against and check that rigorously. It seems like maintaining a filter ABI is hopeless. Rich.> Eric Blake (3): > connections: Don't advertise TRIM on readonly connection > filter: Add .can_zero/.can_fua overrides > filters: Add nozero filter > > TODO | 2 +- > configure.ac | 3 +- > docs/nbdkit-filter.pod | 30 +++++++++++- > docs/nbdkit.pod | 1 + > filters/Makefile.am | 1 + > filters/nozero/Makefile.am | 62 ++++++++++++++++++++++++ > filters/nozero/nbdkit-nozero-filter.pod | 84 +++++++++++++++++++++++++++++++++ > filters/nozero/nozero.c | 68 ++++++++++++++++++++++++++ > include/nbdkit-filter.h | 8 +++- > src/connections.c | 45 +++++++++++++----- > src/filters.c | 70 ++++++++++++++++++++++++--- > src/internal.h | 2 + > src/plugins.c | 22 +++++++++ > 13 files changed, 376 insertions(+), 22 deletions(-) > create mode 100644 filters/nozero/Makefile.am > create mode 100644 filters/nozero/nbdkit-nozero-filter.pod > create mode 100644 filters/nozero/nozero.c > > -- > 2.14.3 > > _______________________________________________ > Libguestfs mailing list > Libguestfs@redhat.com > https://www.redhat.com/mailman/listinfo/libguestfs-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Richard W.M. Jones
2018-Jan-24 14:19 UTC
Re: [Libguestfs] [nbdkit PATCH 1/3] connections: Don't advertise TRIM on readonly connection
On Tue, Jan 23, 2018 at 10:10:13PM -0600, Eric Blake wrote:> A client cannot attempt TRIM on a read-only connection (we already > filter it in connections.c with EROFS); as such, it makes no sense > to query the plugin's .can_trim nor to advertise TRIM support to > the client on such a connection, similar to how we do not > advertise WRITE_ZEROES. > > FUA only has defined meanings on write operations, but the NBD > protocol allows us to advertise it on reads and to ignore clients > that send it on read (older qemu did so), so we can continue to > advertise it regardless of readonly state. Likewise, while flush > is unlikely to make a difference on a read-only connection, there > is no reason why it shouldn't be advertised. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > src/connections.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/src/connections.c b/src/connections.c > index 8e110a5..75c2c2d 100644 > --- a/src/connections.c > +++ b/src/connections.c > @@ -427,6 +427,14 @@ compute_eflags (struct connection *conn, uint16_t *flags) > } > if (!conn->readonly) { > eflags |= NBD_FLAG_SEND_WRITE_ZEROES; > + > + fl = backend->can_trim (backend, conn); > + if (fl == -1) > + return -1; > + if (fl) { > + eflags |= NBD_FLAG_SEND_TRIM; > + conn->can_trim = 1; > + } > } > > fl = backend->can_flush (backend, conn); > @@ -445,14 +453,6 @@ compute_eflags (struct connection *conn, uint16_t *flags) > conn->is_rotational = 1; > } > > - fl = backend->can_trim (backend, conn); > - if (fl == -1) > - return -1; > - if (fl) { > - eflags |= NBD_FLAG_SEND_TRIM; > - conn->can_trim = 1; > - } > - > *flags = eflags; > return 0; > } > --Simple & obvious, ACK. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2018-Jan-24 14:26 UTC
Re: [Libguestfs] [nbdkit PATCH 2/3] filter: Add .can_zero/.can_fua overrides
On Tue, Jan 23, 2018 at 10:10:14PM -0600, Eric Blake wrote:> @@ -348,6 +362,8 @@ static struct nbdkit_next_ops next_ops = { > .flush = next_flush, > .trim = next_trim, > .zero = next_zero, > + .can_zero = next_can_zero, > + .can_fua = next_can_fua,Not that it makes a functional difference but for ease of reading the code maybe we should initialize these in the same order as they appear in the struct definition?> @@ -582,6 +628,8 @@ static struct backend filter_functions = { > .flush = filter_flush, > .trim = filter_trim, > .zero = filter_zero, > + .can_zero = filter_can_zero, > + .can_fua = filter_can_fua,Since the backend struct is internal, we can put the can_* functions next to the others. (Also same comments about the change to src/internal.h)> > /* Register and load a filter. */ > @@ -626,15 +674,23 @@ filter_register (struct backend *next, size_t index, const char *filename, > exit (EXIT_FAILURE); > } > > - /* Since the filter might be much older than the current version of > - * nbdkit, only copy up to the self-declared _struct_size of the > - * filter and zero out the rest. If the filter is much newer then > - * we'll only call the "old" fields. > + /* Providing a stable filter ABI is much harder than a stable plugin > + * ABI. Any newer filter compiled against a larger struct > + * nbdkit_next_ops could potentially dereference fields we did not > + * provide; and any older filter compiled against a smaller struct > + * nbdkit_filter may miss intercepting functions that are vital to > + * avoid corrupting the client-visible data. So for now, we insist > + * on an exact match in _struct_size, and promise that > + * nbdkit_next_ops won't change size without also changing > + * _struct_size. > */ > size = sizeof f->filter; /* our struct */ > - memset (&f->filter, 0, size); > - if (size > filter->_struct_size) > - size = filter->_struct_size; > + if (filter->_struct_size != size) { > + fprintf (stderr, > + "%s: %s: filter compiled against incompatible nbdkit version\n", > + program_name, f->filename); > + exit (EXIT_FAILURE); > + } > memcpy (&f->filter, filter, size);Can we put this into a separate commit?> diff --git a/src/internal.h b/src/internal.h > index 3cbfde5..2f03279 100644 > --- a/src/internal.h > +++ b/src/internal.h > @@ -191,6 +191,8 @@ struct backend { > int (*flush) (struct backend *, struct connection *conn, uint32_t flags); > int (*trim) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags); > int (*zero) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags); > + int (*can_zero) (struct backend *, struct connection *conn); > + int (*can_fua) (struct backend *, struct connection *conn);See above.> > /* plugins.c */ > diff --git a/src/plugins.c b/src/plugins.c > index dba3e24..3468b6d 100644 > --- a/src/plugins.c > +++ b/src/plugins.c > @@ -358,6 +358,26 @@ plugin_can_trim (struct backend *b, struct connection *conn) > return p->plugin.trim != NULL; > } > > +static int > +plugin_can_zero (struct backend *b, struct connection *conn) > +{ > + debug ("can_zero"); > + > + // We always allow .zero to fall back to .write, so plugins don't > + // need to override this > + return plugin_can_write (b, conn); > +} > + > +static int > +plugin_can_fua (struct backend *b, struct connection *conn) > +{ > + debug ("can_fua"); > + > + // TODO - wire FUA flag support into plugins. Until then, this copies > + // can_flush, since that's how we emulate FUA > + return plugin_can_flush (b, conn); > +} > + > static int > plugin_pread (struct backend *b, struct connection *conn, > void *buf, uint32_t count, uint64_t offset, uint32_t flags) > @@ -535,6 +555,8 @@ static struct backend plugin_functions = { > .flush = plugin_flush, > .trim = plugin_trim, > .zero = plugin_zero, > + .can_zero = plugin_can_zero, > + .can_fua = plugin_can_fua,See above. ACK with changes suggested. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Richard W.M. Jones
2018-Jan-24 14:27 UTC
Re: [Libguestfs] [nbdkit PATCH 3/3] filters: Add nozero filter
On Tue, Jan 23, 2018 at 10:10:15PM -0600, Eric Blake wrote:> Sometimes, it's nice to see what a difference it makes in timing > or in destination file size when sparse handling is enabled or > disabled. Add a new filter that makes it trivial to disable > write zero support. > > Signed-off-by: Eric Blake <eblake@redhat.com>This one looks fine, ACK. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
On 01/24/2018 08:18 AM, Richard W.M. Jones wrote:>> Note that even if we try hard to let newer nbdkit load an older >> filter (which I did not do here), we absolutely cannot let older >> nbdkit load a newer filter - while we have sizing limits to >> guarantee we don't call beyond the end of an older filter's >> smaller struct, there are no sizing limits in the other >> direction to prevent a newer filter to know that it cannot call >> beyond the end of our smaller nbdkit_next_ops. > > Hmm, yes, this is a good point that I didn't think about. > > Probably we're going to have to forget about the filter._api_version > and filter._struct_size fields and instead encode the version of > nbdkit it was compiled against and check that rigorously. It seems > like maintaining a filter ABI is hopeless.About the only thing I could think of that MIGHT let us preserve ABI from future filter to older nbdkit is if instead of using next_ops.foo(), we add functions named next_foo(). nbdkit-filter.h would then define weak-linkage functions that can only error out, but which get overridden when available by strong linkage functions in nbdkit proper. Then a newer filter run against an older nbdkit will call the weak-linkage function and see a proper failure in trying to call the next function, while a filter run against the current nbdkit will call the strong-linkage variant that does the intended passthrough to the next layer. But that's a lot of effort (and a one-time API break) to set up something that will allow a stable ABI from that point on, even if we didn't have the issue that running a newer plugin against an older nbdkit may not be a reasonable goal (my FUA patches already demonstrated that having new nbdkit run an old plugin is easy, but having an old nbdkit run a new plugin may not work if the new plugin only provides the new API callbacks and nbdkit-plugin.h doesn't provide fallback shims in the old API callbacks). As requested on 2/3, I'll factor out the ABI check into a separate patch, and go with your suggestion that a simple version check is easier than using a struct size check - which does indeed cement us into a filter must be re-compiled in lockstep with nbdkit (we may still choose to relax that later, but it's always better to start conservative). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [nbdkit PATCH] nozero: Add notrim mode
- [nbdkit PATCH v2 24/24] nocache: Implement new filter
- [PATCH nbdkit 1/4] Add truncate filter for truncating or extending the size of plugins.
- [PATCH v2 nbdkit 5/6] Add truncate filter for truncating or extending the size of plugins.
- [nbdkit PATCH 2/2] filters: Add log filter