Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 0/8] Implement extents using a simpler array.
Not sure what version we're up to, but this reimplements extents using the new simpler structure described in this thread: https://www.redhat.com/archives/libguestfs/2019-March/msg00077.html I also fixed most of the things that Eric pointed out in the previous review, although I need to go back over his replies and check I've got everything. This needs a bit more testing. However the tests do work. Rich.
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
This pair of calls allows plugins to describe which extents in the virtual disk are allocated, holes or zeroes. --- docs/nbdkit-filter.pod | 92 +++++++++++++++++++++ docs/nbdkit-plugin.pod | 99 +++++++++++++++++++++++ include/nbdkit-common.h | 10 ++- include/nbdkit-filter.h | 22 +++++- include/nbdkit-plugin.h | 6 +- server/internal.h | 4 + server/extents.c | 171 ++++++++++++++++++++++++++++++++++++++++ server/filters.c | 59 ++++++++++++++ server/plugins.c | 61 +++++++++++++- .gitignore | 1 + server/Makefile.am | 1 + server/nbdkit.syms | 5 ++ 12 files changed, 527 insertions(+), 4 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index dc9a262..1711351 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -350,6 +350,8 @@ calls. =head2 C<.can_zero> +=head2 C<.can_extents> + =head2 C<.can_fua> =head2 C<.can_multi_conn> @@ -365,6 +367,8 @@ calls. void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -513,6 +517,94 @@ value to return to the client. The filter should never fail with C<EOPNOTSUPP> (while plugins have automatic fallback to C<.pwrite>, filters do not). +=head2 C<.extents> + + int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, + int *err); + +This intercepts the plugin C<.extents> method and can be used to +modify extent requests. + +This function will not be called if C<.can_extents> returned false; in +turn, the filter should not call C<next_ops-E<gt>extents> if +C<next_ops-E<gt>can_extents> did not return true. + +The C<extents> parameter passed to this function is empty, and +should usually be passed straight to the underlying plugin: + + myfilter_extents (...) + { + return next_ops->extents (nxdata, count, offset, flags, + extents, err); + } + +It is also possible for filters to transform the extents list received +back from the layer below. Without error checking it would look like +this: + + myfilter_extents (...) + { + size_t i; + struct nbdkit_extents *extents2; + struct nbdkit_extent e; + + extents2 = nbdkit_extents_new (offset); + next_ops->extents (nxdata, count, offset, flags, extents2, err); + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + e.offset = /* transform offset */; + nbdkit_add_extent (extents, e.offset, e.length, e.type); + } + nbdkit_extents_free (extents2); + } + +More complicated transformations may require the filter to allocate a +new extents list using C<nbdkit_extents_new> and free the old one +using C<nbdkit_extents_free>. + +If there is an error, C<.extents> should call C<nbdkit_error> with an +error message B<and> return -1 with C<err> set to the positive errno +value to return to the client. + +=head3 Allocating and freeing nbdkit_extents list + +Two functions are provided to filters only for allocating and freeing +the map: + + struct nbdkit_extents *nbdkit_extents_new (uint64_t start); + +Allocates and returns a new, empty extents list. The C<start> +parameter is the start of the range described in the list. Normally +you would pass in C<offset> as this parameter. + +On error this function can return C<NULL>. In this case it calls +C<nbdkit_error> and/or C<nbdkit_set_error> as required. + + void nbdkit_extents_free (struct nbdkit_extents *); + +Frees an existing extents list. + +=head3 Iterating over nbdkit_extents list + +Two functions are provided to filters only to iterate over the extents +in order: + + size_t nbdkit_extents_size (const struct nbdkit_extents *); + +Returns the number of extents in the list. + + struct nbdkit_extent { + uint64_t offset; + uint64_t length; + uint32_t type; + }; + struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *, + size_t i); + +Returns a copy of the C<i>'th extent. + =head1 ERROR HANDLING If there is an error in the filter itself, the filter should call diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index 47545f3..a603f8b 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -555,6 +555,20 @@ This callback is not required. If omitted, then nbdkit always tries C<.zero> first if it is present, and gracefully falls back to C<.pwrite> if C<.zero> was absent or failed with C<EOPNOTSUPP>. +=head2 C<.can_extents> + + int can_extents (void *handle); + +This is called during the option negotiation phase to find out if the +plugin supports detecting allocated (non-sparse) regions of the disk +with the C<.extents> callback. + +If there is an error, C<.can_extents> should call C<nbdkit_error> with +an error message and return C<-1>. + +This callback is not required. If omitted, then we return true iff a +C<.extents> callback has been defined. + =head2 C<.can_fua> int can_fua (void *handle); @@ -717,6 +731,91 @@ If there is an error, C<.zero> should call C<nbdkit_error> with an error message, and C<nbdkit_set_error> to record an appropriate error (unless C<errno> is sufficient), then return C<-1>. +=head2 C<.extents> + + int extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents); + +During the data serving phase, this callback is used to detect +allocated, sparse and zeroed regions of the disk. + +This function will not be called if C<.can_extents> returned false. +nbdkit's default behaviour in this case is to treat the whole virtual +disk as if it was allocated. + +The callback should detect and return the list of extents overlapping +the range C<[offset...offset+count-1]>. The C<extents> parameter +points to an opaque object which the callback should fill in by +calling C<nbdkit_add_extent>. See L</Extents list> below. + +If there is an error, C<.extents> should call C<nbdkit_error> with an +error message, and C<nbdkit_set_error> to record an appropriate error +(unless C<errno> is sufficient), then return C<-1>. + +=head3 Extents list + +The plugin C<extents> callback is passed an opaque pointer C<struct +nbdkit_extents *extents>. This structure represents a list of +L<filesystem extents|https://en.wikipedia.org/wiki/Extent_(file_systems)> +describing which areas of the disk are allocated, which are sparse +(“holes”), and, if supported, which are zeroes. + +The C<extents> callback should scan the disk starting at C<offset> and +call C<nbdkit_add_extent> for each extent found. + +Extents overlapping the range C<[offset...offset+count-1]> should be +returned if possible. However nbdkit ignores extents E<lt> offset so +the plugin may, if it is easier to implement, return all extent +information for the whole disk. The plugin may return extents beyond +the end of the range. It may also return extent information for less +than the whole range, but it must return at least one extent. + +The extents B<must> be added in ascending order, and B<must> be +contiguous. + +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> +which means that the client is only requesting information about the +extent overlapping C<offset>. The plugin may ignore this flag, or as +an optimization it may return just a single extent for C<offset>. +(Note that implementing this optimization correctly is difficult and +subtle. You must ensure that the upper limit of the single extent +returned is correct. If unsure then the safest option is to ignore +this flag.) + + int nbdkit_add_extent (struct nbdkit_extents *extents, + uint64_t offset, uint64_t length, uint32_t type); + +Add an extent covering C<[offset...offset+length-1]> of one of +the following four types: + +=over 4 + +=item C<type = 0> + +A normal, allocated data extent. + +=item C<type = NBDKIT_EXTENT_HOLE|NBDKIT_EXTENT_ZERO> + +An unallocated extent, a.k.a. a “hole”, which reads back as zeroes. +This is the normal type of hole applicable to most disks. + +=item C<type = NBDKIT_EXTENT_ZERO> + +An allocated extent which is known to contain only zeroes. + +=item C<type = NBDKIT_EXTENT_HOLE> + +An unallocated extent (hole) which does not read back as zeroes. Note +this should only be used in specialized circumstances such as when +writing a plugin for (or to emulate) certain SCSI drives which do not +guarantee that trimmed blocks read back as zeroes. + +=back + +C<nbdkit_extent_add> returns C<0> on success or C<-1> on failure. On +failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been +called. + =head1 THREADS Each nbdkit plugin must declare its thread safety model by defining diff --git a/include/nbdkit-common.h b/include/nbdkit-common.h index cb9954e..7c69b14 100644 --- a/include/nbdkit-common.h +++ b/include/nbdkit-common.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -60,11 +60,15 @@ extern "C" { #define NBDKIT_FLAG_MAY_TRIM (1<<0) /* Maps to !NBD_CMD_FLAG_NO_HOLE */ #define NBDKIT_FLAG_FUA (1<<1) /* Maps to NBD_CMD_FLAG_FUA */ +#define NBDKIT_FLAG_REQ_ONE (1<<2) /* Maps to NBD_CMD_FLAG_REQ_ONE */ #define NBDKIT_FUA_NONE 0 #define NBDKIT_FUA_EMULATE 1 #define NBDKIT_FUA_NATIVE 2 +#define NBDKIT_EXTENT_HOLE (1<<0) /* Same as NBD_STATE_HOLE */ +#define NBDKIT_EXTENT_ZERO (1<<1) /* Same as NBD_STATE_ZERO */ + extern void nbdkit_error (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2); extern void nbdkit_verror (const char *msg, va_list args); extern void nbdkit_debug (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2); @@ -76,6 +80,10 @@ extern int nbdkit_parse_bool (const char *str); extern int nbdkit_read_password (const char *value, char **password); extern char *nbdkit_realpath (const char *path); +struct nbdkit_extents; +extern int nbdkit_add_extent (struct nbdkit_extents *, + uint64_t offset, uint64_t length, uint32_t type); + /* A static non-NULL pointer which can be used when you don't need a * per-connection handle. */ diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 71c06c8..06a3eca 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -44,6 +44,18 @@ extern "C" { #define NBDKIT_FILTER_API_VERSION 2 +struct nbdkit_extent { + uint64_t offset; + uint64_t length; + uint32_t type; +}; + +extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start); +extern void nbdkit_extents_free (struct nbdkit_extents *); +extern size_t nbdkit_extents_size (const struct nbdkit_extents *); +extern struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *, + size_t); + typedef int nbdkit_next_config (void *nxdata, const char *key, const char *value); typedef int nbdkit_next_config_complete (void *nxdata); @@ -58,6 +70,7 @@ struct nbdkit_next_ops { int (*is_rotational) (void *nxdata); int (*can_trim) (void *nxdata); int (*can_zero) (void *nxdata); + int (*can_extents) (void *nxdata); int (*can_fua) (void *nxdata); int (*can_multi_conn) (void *nxdata); @@ -71,6 +84,8 @@ struct nbdkit_next_ops { int *err); int (*zero) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err); }; struct nbdkit_filter { @@ -120,6 +135,8 @@ struct nbdkit_filter { void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -140,6 +157,9 @@ struct nbdkit_filter { int (*zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err); }; #define NBDKIT_REGISTER_FILTER(filter) \ diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index d43b2f5..20d193c 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -125,6 +125,10 @@ struct nbdkit_plugin { const char *magic_config_key; int (*can_multi_conn) (void *handle); + + int (*can_extents) (void *handle); + int (*extents) (void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents); }; extern void nbdkit_set_error (int err); diff --git a/server/internal.h b/server/internal.h index d40a82d..71d2815 100644 --- a/server/internal.h +++ b/server/internal.h @@ -274,6 +274,7 @@ struct backend { int (*is_rotational) (struct backend *, struct connection *conn); int (*can_trim) (struct backend *, struct connection *conn); int (*can_zero) (struct backend *, struct connection *conn); + int (*can_extents) (struct backend *, struct connection *conn); int (*can_fua) (struct backend *, struct connection *conn); int (*can_multi_conn) (struct backend *, struct connection *conn); @@ -287,6 +288,9 @@ struct backend { uint64_t offset, uint32_t flags, int *err); int (*zero) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (struct backend *, struct connection *conn, uint32_t count, + uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err); }; /* plugins.c */ diff --git a/server/extents.c b/server/extents.c new file mode 100644 index 0000000..93ed017 --- /dev/null +++ b/server/extents.c @@ -0,0 +1,171 @@ +/* nbdkit + * Copyright (C) 2019 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdbool.h> +#include <string.h> +#include <inttypes.h> +#include <assert.h> + +#include "minmax.h" + +#include "internal.h" + +struct nbdkit_extents { + struct nbdkit_extent *extents; + size_t nr_extents, allocated; + uint64_t start; +}; + +struct nbdkit_extents * +nbdkit_extents_new (uint64_t start) +{ + struct nbdkit_extents *r; + + if (start >= INT64_MAX) { + nbdkit_error ("nbdkit_extents_new: start (%" PRIu64 ") > INT64_MAX", + start); + return NULL; + } + + r = malloc (sizeof *r); + if (r == NULL) { + nbdkit_error ("nbdkit_extents_new: malloc: %m"); + return NULL; + } + r->extents = NULL; + r->nr_extents = r->allocated = 0; + r->start = start; + return r; +} + +void +nbdkit_extents_free (struct nbdkit_extents *exts) +{ + if (exts) { + free (exts->extents); + free (exts); + } +} + +size_t +nbdkit_extents_size (const struct nbdkit_extents *exts) +{ + return exts->nr_extents; +} + +const struct nbdkit_extent +nbdkit_get_extent (const struct nbdkit_extents *exts, size_t i) +{ + assert (i < exts->nr_extents); + return exts->extents[i]; +} + +/* Insert *e in the list at the end. */ +static int +append_extent (struct nbdkit_extents *exts, const struct nbdkit_extent *e) +{ + if (exts->nr_extents >= exts->allocated) { + size_t new_allocated; + struct nbdkit_extent *new_extents; + + new_allocated = exts->allocated; + if (new_allocated == 0) + new_allocated = 1; + new_allocated *= 2; + new_extents + realloc (exts->extents, new_allocated * sizeof (struct nbdkit_extent)); + if (new_extents == NULL) { + nbdkit_error ("nbdkit_add_extent: realloc: %m"); + return -1; + } + exts->allocated = new_allocated; + exts->extents = new_extents; + } + + exts->extents[exts->nr_extents] = *e; + exts->nr_extents++; + return 0; +} + +int +nbdkit_add_extent (struct nbdkit_extents *exts, + uint64_t offset, uint64_t length, uint32_t type) +{ + uint64_t overlap; + + /* If there are existing extents, the new extent must be contiguous. */ + if (exts->nr_extents > 0) { + const struct nbdkit_extent *ep; + + ep = &exts->extents[exts->nr_extents-1]; + if (offset != ep->offset + ep->length) { + nbdkit_error ("nbdkit_add_extent: " + "extents must be added in ascending order and " + "must be contiguous"); + return -1; + } + } + else { + /* If there are no existing extents, and the new extent is + * entirely before start, ignore it. + */ + if (offset + length <= exts->start) + return 0; + + /* If there are no existing extents, and the new extent is after + * start, then this is a bug in the plugin. + */ + if (offset > exts->start) { + nbdkit_error ("nbdkit_add_extent: " + "first extent must not be > start (%" PRIu64 ")", + exts->start); + return -1; + } + + /* If there are no existing extents, and the new extent overlaps + * start, truncate it so it starts at start. + */ + overlap = exts->start - offset; + length -= overlap; + offset += overlap; + } + + const struct nbdkit_extent e + { .offset = offset, .length = length, .type = type }; + return append_extent (exts, &e); +} diff --git a/server/filters.c b/server/filters.c index 5b7abc4..5095188 100644 --- a/server/filters.c +++ b/server/filters.c @@ -309,6 +309,13 @@ next_can_zero (void *nxdata) return b_conn->b->can_zero (b_conn->b, b_conn->conn); } +static int +next_can_extents (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_extents (b_conn->b, b_conn->conn); +} + static int next_can_fua (void *nxdata) { @@ -364,6 +371,15 @@ next_zero (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, return b_conn->b->zero (b_conn->b, b_conn->conn, count, offset, flags, err); } +static int +next_extents (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->extents (b_conn->b, b_conn->conn, count, offset, flags, + extents, err); +} + static struct nbdkit_next_ops next_ops = { .get_size = next_get_size, .can_write = next_can_write, @@ -371,6 +387,7 @@ static struct nbdkit_next_ops next_ops = { .is_rotational = next_is_rotational, .can_trim = next_can_trim, .can_zero = next_can_zero, + .can_extents = next_can_extents, .can_fua = next_can_fua, .can_multi_conn = next_can_multi_conn, .pread = next_pread, @@ -378,6 +395,7 @@ static struct nbdkit_next_ops next_ops = { .flush = next_flush, .trim = next_trim, .zero = next_zero, + .extents = next_extents, }; static int @@ -511,6 +529,21 @@ filter_can_zero (struct backend *b, struct connection *conn) return f->backend.next->can_zero (f->backend.next, conn); } +static int +filter_can_extents (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + debug ("%s: can_extents", f->name); + + if (f->filter.can_extents) + return f->filter.can_extents (&next_ops, &nxdata, handle); + else + return f->backend.next->can_extents (f->backend.next, conn); +} + static int filter_can_fua (struct backend *b, struct connection *conn) { @@ -646,6 +679,30 @@ filter_zero (struct backend *b, struct connection *conn, count, offset, flags, err); } +static int +filter_extents (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + + debug ("%s: extents count=%" PRIu32 " offset=%" PRIu64 " flags=0x%" PRIx32, + f->name, count, offset, flags); + + if (f->filter.extents) + return f->filter.extents (&next_ops, &nxdata, handle, + count, offset, flags, + extents, err); + else + return f->backend.next->extents (f->backend.next, conn, + count, offset, flags, + extents, err); +} + static struct backend filter_functions = { .free = filter_free, .thread_model = filter_thread_model, @@ -667,6 +724,7 @@ static struct backend filter_functions = { .is_rotational = filter_is_rotational, .can_trim = filter_can_trim, .can_zero = filter_can_zero, + .can_extents = filter_can_extents, .can_fua = filter_can_fua, .can_multi_conn = filter_can_multi_conn, .pread = filter_pread, @@ -674,6 +732,7 @@ static struct backend filter_functions = { .flush = filter_flush, .trim = filter_trim, .zero = filter_zero, + .extents = filter_extents, }; /* Register and load a filter. */ diff --git a/server/plugins.c b/server/plugins.c index 28b96ad..15afa56 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -199,6 +199,8 @@ plugin_dump_fields (struct backend *b) HAS (trim); HAS (zero); HAS (can_multi_conn); + HAS (can_extents); + HAS (extents); #undef HAS /* Custom fields. */ @@ -392,6 +394,21 @@ plugin_can_zero (struct backend *b, struct connection *conn) return plugin_can_write (b, conn); } +static int +plugin_can_extents (struct backend *b, struct connection *conn) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + assert (connection_get_handle (conn, 0)); + + debug ("can_extents"); + + if (p->plugin.can_extents) + return p->plugin.can_extents (connection_get_handle (conn, 0)); + else + return p->plugin.extents != NULL; +} + static int plugin_can_fua (struct backend *b, struct connection *conn) { @@ -651,6 +668,46 @@ plugin_zero (struct backend *b, struct connection *conn, return r; } +static int +plugin_extents (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + int r; + + assert (connection_get_handle (conn, 0)); + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + + debug ("extents count=%" PRIu32 " offset=%" PRIu64 " req_one=%d", + count, offset, req_one); + + if (!count) + return 0; + + if (p->plugin.extents) { + r = p->plugin.extents (connection_get_handle (conn, 0), count, offset, + flags, extents); + if (r >= 0 && nbdkit_extents_size (extents) < 1) { + nbdkit_error ("extents: plugin must return at least one extent"); + nbdkit_set_error (EINVAL); + r = -1; + } + if (r == -1) + *err = get_error (p); + return r; + } + else { + /* By default we assume that everything in the range is allocated. */ + errno = 0; + r = nbdkit_add_extent (extents, offset, count, 0 /* allocated data */); + if (r == -1) + *err = errno ? errno : EINVAL; + return r; + } +} + static struct backend plugin_functions = { .free = plugin_free, .thread_model = plugin_thread_model, @@ -672,6 +729,7 @@ static struct backend plugin_functions = { .is_rotational = plugin_is_rotational, .can_trim = plugin_can_trim, .can_zero = plugin_can_zero, + .can_extents = plugin_can_extents, .can_fua = plugin_can_fua, .can_multi_conn = plugin_can_multi_conn, .pread = plugin_pread, @@ -679,6 +737,7 @@ static struct backend plugin_functions = { .flush = plugin_flush, .trim = plugin_trim, .zero = plugin_zero, + .extents = plugin_extents, }; /* Register and load a plugin. */ diff --git a/.gitignore b/.gitignore index dc42abd..1f8b2e0 100644 --- a/.gitignore +++ b/.gitignore @@ -63,6 +63,7 @@ Makefile.in /server/nbdkit.pc /server/protostrings.c /server/synopsis.c +/server/test-extents /server/test-utils /stamp-h1 /tests/disk diff --git a/server/Makefile.am b/server/Makefile.am index 5eb575e..c73223a 100644 --- a/server/Makefile.am +++ b/server/Makefile.am @@ -43,6 +43,7 @@ nbdkit_SOURCES = \ connections.c \ crypto.c \ debug.c \ + extents.c \ filters.c \ internal.h \ locks.c \ diff --git a/server/nbdkit.syms b/server/nbdkit.syms index 672abd2..18d2b9c 100644 --- a/server/nbdkit.syms +++ b/server/nbdkit.syms @@ -40,8 +40,13 @@ # The functions we want plugins and filters to call. global: nbdkit_absolute_path; + nbdkit_add_extent; nbdkit_debug; nbdkit_error; + nbdkit_extents_free; + nbdkit_extents_new; + nbdkit_extents_size; + nbdkit_get_extent; nbdkit_parse_bool; nbdkit_parse_size; nbdkit_read_password; -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 2/8] server: Add CLEANUP_EXTENTS_FREE macro.
Provides automatic cleanup of ‘struct nbdkit_extents_map *’ on exit from a scope or function. --- server/internal.h | 2 ++ server/cleanup.c | 8 +++++++- server/Makefile.am | 3 ++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/server/internal.h b/server/internal.h index 71d2815..ae51804 100644 --- a/server/internal.h +++ b/server/internal.h @@ -139,6 +139,8 @@ extern void change_user (void); /* cleanup.c */ extern void cleanup_free (void *ptr); #define CLEANUP_FREE __attribute__((cleanup (cleanup_free))) +extern void cleanup_extents_free (void *ptr); +#define CLEANUP_EXTENTS_FREE __attribute__((cleanup (cleanup_extents_free))) extern void cleanup_unlock (pthread_mutex_t **ptr); #define CLEANUP_UNLOCK __attribute__((cleanup (cleanup_unlock))) #define ACQUIRE_LOCK_FOR_CURRENT_SCOPE(mutex) \ diff --git a/server/cleanup.c b/server/cleanup.c index c29ecec..1eec613 100644 --- a/server/cleanup.c +++ b/server/cleanup.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -46,6 +46,12 @@ cleanup_free (void *ptr) free (* (void **) ptr); } +void +cleanup_extents_free (void *ptr) +{ + nbdkit_extents_free (* (void **) ptr); +} + void cleanup_unlock (pthread_mutex_t **ptr) { diff --git a/server/Makefile.am b/server/Makefile.am index c73223a..fb21d42 100644 --- a/server/Makefile.am +++ b/server/Makefile.am @@ -137,7 +137,8 @@ check_PROGRAMS = test-utils test_utils_SOURCES = \ test-utils.c \ utils.c \ - cleanup.c + cleanup.c \ + extents.c test_utils_CPPFLAGS = \ -I$(top_srcdir)/include \ -I$(top_srcdir)/common/include -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 3/8] server: Implement Block Status requests to read allocation status.
This commit implements the core NBD protocol for the "base:allocation" Block Status replies. --- server/internal.h | 7 ++ server/protocol.h | 15 +++ server/protocol-handshake-newstyle.c | 81 ++++++++++++++- server/protocol.c | 141 ++++++++++++++++++++++++--- 4 files changed, 229 insertions(+), 15 deletions(-) diff --git a/server/internal.h b/server/internal.h index ae51804..58d5794 100644 --- a/server/internal.h +++ b/server/internal.h @@ -183,6 +183,7 @@ struct connection { bool can_multi_conn; bool using_tls; bool structured_replies; + bool meta_context_base_allocation; int sockin, sockout; connection_recv_function recv; @@ -219,6 +220,12 @@ extern int protocol_handshake_newstyle (struct connection *conn) extern int protocol_recv_request_send_reply (struct connection *conn) __attribute__((__nonnull__ (1))); +/* The context ID of base:allocation. As far as I can tell it doesn't + * matter what this is as long as nbdkit always returns the same + * number. + */ +#define base_allocation_id 1 + /* crypto.c */ #define root_tls_certificates_dir sysconfdir "/pki/" PACKAGE_NAME extern void crypto_init (bool tls_set_on_cli); diff --git a/server/protocol.h b/server/protocol.h index 06b917e..a7de2f0 100644 --- a/server/protocol.h +++ b/server/protocol.h @@ -112,6 +112,7 @@ extern const char *name_of_nbd_rep (int); #define NBD_REP_ACK 1 #define NBD_REP_SERVER 2 #define NBD_REP_INFO 3 +#define NBD_REP_META_CONTEXT 4 #define NBD_REP_ERR_UNSUP 0x80000001 #define NBD_REP_ERR_POLICY 0x80000002 #define NBD_REP_ERR_INVALID 0x80000003 @@ -128,6 +129,18 @@ struct fixed_new_option_reply_info_export { uint16_t eflags; /* per-export flags */ } __attribute__((packed)); +/* NBD_REP_META_CONTEXT reply (follows fixed_new_option_reply). */ +struct fixed_new_option_reply_meta_context { + uint32_t context_id; /* metadata context ID */ + /* followed by a string */ +} __attribute__((packed)); + +/* NBD_REPLY_TYPE_BLOCK_STATUS block descriptor. */ +struct block_descriptor { + uint32_t length; /* length of block */ + uint32_t status_flags; /* block type (hole etc) */ +} __attribute__((packed)); + /* New-style handshake server reply when using NBD_OPT_EXPORT_NAME. * Modern clients use NBD_OPT_GO instead of this. */ @@ -199,10 +212,12 @@ extern const char *name_of_nbd_cmd (int); #define NBD_CMD_FLUSH 3 #define NBD_CMD_TRIM 4 #define NBD_CMD_WRITE_ZEROES 6 +#define NBD_CMD_BLOCK_STATUS 7 extern const char *name_of_nbd_cmd_flag (int); #define NBD_CMD_FLAG_FUA (1<<0) #define NBD_CMD_FLAG_NO_HOLE (1<<1) +#define NBD_CMD_FLAG_REQ_ONE (1<<3) /* Error codes (previously errno). * See http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ca4414804114fd0095b317785bc0b51862e62ebb diff --git a/server/protocol-handshake-newstyle.c b/server/protocol-handshake-newstyle.c index db01f7b..0be185c 100644 --- a/server/protocol-handshake-newstyle.c +++ b/server/protocol-handshake-newstyle.c @@ -133,6 +133,34 @@ send_newstyle_option_reply_info_export (struct connection *conn, return 0; } +static int +send_newstyle_option_reply_meta_context (struct connection *conn, + uint32_t option, uint32_t reply, + uint32_t context_id, + const char *name) +{ + struct fixed_new_option_reply fixed_new_option_reply; + struct fixed_new_option_reply_meta_context context; + const size_t namelen = strlen (name); + + fixed_new_option_reply.magic = htobe64 (NBD_REP_MAGIC); + fixed_new_option_reply.option = htobe32 (option); + fixed_new_option_reply.reply = htobe32 (reply); + fixed_new_option_reply.replylen = htobe32 (sizeof context + namelen); + context.context_id = htobe32 (context_id); + + if (conn->send (conn, + &fixed_new_option_reply, + sizeof fixed_new_option_reply) == -1 || + conn->send (conn, &context, sizeof context) == -1 || + conn->send (conn, name, namelen) == -1) { + nbdkit_error ("write: %m"); + return -1; + } + + return 0; +} + /* Sub-function during negotiate_handshake_newstyle, to uniformly handle * a client hanging up on a message boundary. */ @@ -452,6 +480,8 @@ negotiate_handshake_newstyle_options (struct connection *conn) case NBD_OPT_LIST_META_CONTEXT: case NBD_OPT_SET_META_CONTEXT: { + int r; + bool can_extents; uint32_t opt_index; uint32_t exportnamelen; uint32_t nr_queries; @@ -469,6 +499,16 @@ negotiate_handshake_newstyle_options (struct connection *conn) continue; } + /* Work out if the server supports base:allocation. */ + r = backend->can_extents (backend, conn); + if (r == -1) { + if (send_newstyle_option_reply (conn, option, NBD_REP_ERR_INVALID) + == -1) + return -1; + continue; + } + can_extents = r; + /* Minimum length of the option payload is: * 32 bit export name length followed by empty export name * + 32 bit number of queries followed by no queries @@ -503,7 +543,17 @@ negotiate_handshake_newstyle_options (struct connection *conn) * for SET: nr_queries == 0 means reset all contexts */ if (nr_queries == 0) { - /* Nothing is supported now. */ + if (option == NBD_OPT_SET_META_CONTEXT) + conn->meta_context_base_allocation = false; + else /* LIST */ { + if (can_extents) { + if (send_newstyle_option_reply_meta_context + (conn, option, NBD_REP_META_CONTEXT, + 0, "base:allocation") == -1) + return -1; + } + } + if (send_newstyle_option_reply (conn, option, NBD_REP_ACK) == -1) return -1; } @@ -525,7 +575,34 @@ negotiate_handshake_newstyle_options (struct connection *conn) option == NBD_OPT_LIST_META_CONTEXT ? "query" : "set", (int) querylen, &data[opt_index]); - /* Ignore query - nothing is supported. */ + /* For LIST, "base:" returns all supported contexts in the + * base namespace. We only support "base:allocation". + */ + if (option == NBD_OPT_LIST_META_CONTEXT && + querylen == 5 && + strncmp (&data[opt_index], "base:", 5) == 0) { + if (can_extents) { + if (send_newstyle_option_reply_meta_context + (conn, option, NBD_REP_META_CONTEXT, + 0, "base:allocation") == -1) + return -1; + } + } + /* "base:allocation" requested by name. */ + else if (querylen == 15 && + strncmp (&data[opt_index], "base:allocation", 15) == 0) { + if (can_extents) { + if (send_newstyle_option_reply_meta_context + (conn, option, NBD_REP_META_CONTEXT, + option == NBD_OPT_SET_META_CONTEXT + ? base_allocation_id : 0, + "base:allocation") == -1) + return -1; + if (option == NBD_OPT_SET_META_CONTEXT) + conn->meta_context_base_allocation = true; + } + } + /* Every other query must be ignored. */ opt_index += querylen; nr_queries--; diff --git a/server/protocol.c b/server/protocol.c index f117d42..466be7c 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -36,6 +36,7 @@ #include <stdio.h> #include <stdlib.h> #include <stdint.h> +#include <stdbool.h> #include <inttypes.h> #include <string.h> #include <unistd.h> @@ -78,6 +79,7 @@ validate_request (struct connection *conn, case NBD_CMD_WRITE: case NBD_CMD_TRIM: case NBD_CMD_WRITE_ZEROES: + case NBD_CMD_BLOCK_STATUS: if (!valid_range (conn, offset, count)) { /* XXX Allow writes to extend the disk? */ nbdkit_error ("invalid request: %s: offset and count are out of range: " @@ -106,7 +108,8 @@ validate_request (struct connection *conn, } /* Validate flags */ - if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) { + if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE | + NBD_CMD_FLAG_REQ_ONE)) { nbdkit_error ("invalid request: unknown flag (0x%x)", flags); *error = EINVAL; return false; @@ -117,6 +120,12 @@ validate_request (struct connection *conn, *error = EINVAL; return false; } + if ((flags & NBD_CMD_FLAG_REQ_ONE) && + cmd != NBD_CMD_BLOCK_STATUS) { + nbdkit_error ("invalid request: REQ_ONE flag needs BLOCK_STATUS request"); + *error = EINVAL; + return false; + } if (!conn->can_fua && (flags & NBD_CMD_FLAG_FUA)) { nbdkit_error ("invalid request: FUA flag not supported"); *error = EINVAL; @@ -157,14 +166,37 @@ validate_request (struct connection *conn, return false; } + /* Block status allowed? */ + if (cmd == NBD_CMD_BLOCK_STATUS) { + if (!conn->structured_replies) { + nbdkit_error ("invalid request: " + "%s: structured replies was not negotiated", + name_of_nbd_cmd (cmd)); + *error = EINVAL; + return false; + } + if (!conn->meta_context_base_allocation) { + nbdkit_error ("invalid request: " + "%s: base:allocation was not negotiated", + name_of_nbd_cmd (cmd)); + *error = EINVAL; + return false; + } + } + return true; /* Command validates. */ } /* This is called with the request lock held to actually execute the * request (by calling the plugin). Note that the request fields have * been validated already in 'validate_request' so we don't have to - * check them again. 'buf' is either the data to be written or the - * data to be returned, and points to a buffer of size 'count' bytes. + * check them again. + * + * 'buf' is either the data to be written or the data to be returned, + * and points to a buffer of size 'count' bytes. + * + * 'extents' is an empty extents list used for block status requests + * only. * * In all cases, the return value is the system errno value that will * later be converted to the nbd error to send back to the client (0 @@ -173,7 +205,7 @@ validate_request (struct connection *conn, static uint32_t handle_request (struct connection *conn, uint16_t cmd, uint16_t flags, uint64_t offset, uint32_t count, - void *buf) + void *buf, struct nbdkit_extents *extents) { uint32_t f = 0; bool fua = conn->can_fua && (flags & NBD_CMD_FLAG_FUA); @@ -217,6 +249,14 @@ handle_request (struct connection *conn, return err; break; + case NBD_CMD_BLOCK_STATUS: + if (flags & NBD_CMD_FLAG_REQ_ONE) + f |= NBDKIT_FLAG_REQ_ONE; + if (backend->extents (backend, conn, count, offset, f, + extents, &err) == -1) + return err; + break; + default: abort (); } @@ -359,6 +399,64 @@ send_structured_reply_read (struct connection *conn, return 1; /* command processed ok */ } +static int +send_structured_reply_block_status (struct connection *conn, + uint64_t handle, + uint16_t cmd, uint16_t flags, + uint32_t count, uint64_t offset, + struct nbdkit_extents *extents) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock); + struct structured_reply reply; + size_t nr_extents = nbdkit_extents_size (extents); + uint32_t context_id; + size_t i; + int r; + + assert (cmd == NBD_CMD_BLOCK_STATUS); + + reply.magic = htobe32 (NBD_STRUCTURED_REPLY_MAGIC); + reply.handle = handle; + reply.flags = htobe16 (NBD_REPLY_FLAG_DONE); + reply.type = htobe16 (NBD_REPLY_TYPE_BLOCK_STATUS); + reply.length = htobe32 (sizeof context_id + + nr_extents * sizeof (struct block_descriptor)); + + r = conn->send (conn, &reply, sizeof reply); + if (r == -1) { + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); + return connection_set_status (conn, -1); + } + + /* Send the base:allocation context ID. */ + context_id = htobe32 (base_allocation_id); + r = conn->send (conn, &context_id, sizeof context_id); + if (r == -1) { + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); + return connection_set_status (conn, -1); + } + + /* Send each block descriptor. */ + for (i = 0; i < nr_extents; ++i) { + const struct nbdkit_extent e = nbdkit_get_extent (extents, i); + struct block_descriptor bd; + + if (i == 0) + assert (e.offset == offset); + + bd.length = htobe32 (e.length); + bd.status_flags = htobe32 (e.type & 3); + + r = conn->send (conn, &bd, sizeof bd); + if (r == -1) { + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); + return connection_set_status (conn, -1); + } + } + + return 1; /* command processed ok */ +} + static int send_structured_reply_error (struct connection *conn, uint64_t handle, uint16_t cmd, uint32_t error) @@ -402,6 +500,7 @@ protocol_recv_request_send_reply (struct connection *conn) uint32_t magic, count, error = 0; uint64_t offset; CLEANUP_FREE char *buf = NULL; + CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents = NULL; /* Read the request packet. */ { @@ -449,6 +548,7 @@ protocol_recv_request_send_reply (struct connection *conn) if (cmd == NBD_CMD_READ || cmd == NBD_CMD_WRITE) { buf = malloc (count); if (buf == NULL) { + out_of_memory: perror ("malloc"); error = ENOMEM; if (cmd == NBD_CMD_WRITE && @@ -458,6 +558,13 @@ protocol_recv_request_send_reply (struct connection *conn) } } + /* Allocate the extents list for block status only. */ + if (cmd == NBD_CMD_BLOCK_STATUS) { + extents = nbdkit_extents_new (offset); + if (extents == NULL) + goto out_of_memory; + } + /* Receive the write data buffer. */ if (cmd == NBD_CMD_WRITE) { r = conn->recv (conn, buf, count); @@ -478,7 +585,7 @@ protocol_recv_request_send_reply (struct connection *conn) } else { lock_request (conn); - error = handle_request (conn, cmd, flags, offset, count, buf); + error = handle_request (conn, cmd, flags, offset, count, buf, extents); assert ((int) error >= 0); unlock_request (conn); } @@ -498,15 +605,23 @@ protocol_recv_request_send_reply (struct connection *conn) } /* Currently we prefer to send simple replies for everything except - * where we have to (ie. NBD_CMD_READ when structured_replies have - * been negotiated). However this prevents us from sending - * human-readable error messages to the client, so we should - * reconsider this in future. + * where we have to (ie. NBD_CMD_READ and NBD_CMD_BLOCK_STATUS when + * structured_replies have been negotiated). However this prevents + * us from sending human-readable error messages to the client, so + * we should reconsider this in future. */ - if (conn->structured_replies && cmd == NBD_CMD_READ) { - if (!error) - return send_structured_reply_read (conn, request.handle, cmd, - buf, count, offset); + if (conn->structured_replies && + (cmd == NBD_CMD_READ || cmd == NBD_CMD_BLOCK_STATUS)) { + if (!error) { + if (cmd == NBD_CMD_READ) + return send_structured_reply_read (conn, request.handle, cmd, + buf, count, offset); + else /* NBD_CMD_BLOCK_STATUS */ + return send_structured_reply_block_status (conn, request.handle, + cmd, flags, + count, offset, + extents); + } else return send_structured_reply_error (conn, request.handle, cmd, error); } -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 4/8] offset: Implement mapping of extents.
Allows you to safely use nbdkit-offset-filter on top of a plugin supporting extents. --- filters/offset/offset.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/filters/offset/offset.c b/filters/offset/offset.c index 058571d..35791bf 100644 --- a/filters/offset/offset.c +++ b/filters/offset/offset.c @@ -37,6 +37,7 @@ #include <stdlib.h> #include <stdint.h> #include <string.h> +#include <assert.h> #include <nbdkit-filter.h> @@ -132,6 +133,39 @@ offset_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->zero (nxdata, count, offs + offset, flags, err); } +/* Extents. */ +static int +offset_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + size_t i; + struct nbdkit_extents *extents2; + struct nbdkit_extent e; + + extents2 = nbdkit_extents_new (offs + offset); + if (extents2 == NULL) { + *err = errno; + return -1; + } + if (next_ops->extents (nxdata, count, offs + offset, + flags, extents2, err) == -1) + goto error; + + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + e.offset -= offset; + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) + goto error; + } + nbdkit_extents_free (extents2); + return 0; + + error: + nbdkit_extents_free (extents2); + return -1; +} + static struct nbdkit_filter filter = { .name = "offset", .longname = "nbdkit offset filter", @@ -144,6 +178,7 @@ static struct nbdkit_filter filter = { .pwrite = offset_pwrite, .trim = offset_trim, .zero = offset_zero, + .extents = offset_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 5/8] partition: Implement mapping of extents.
--- filters/partition/partition.c | 36 +++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/filters/partition/partition.c b/filters/partition/partition.c index c1fe7a1..ace9f4b 100644 --- a/filters/partition/partition.c +++ b/filters/partition/partition.c @@ -38,6 +38,7 @@ #include <stdint.h> #include <string.h> #include <inttypes.h> +#include <assert.h> #include <nbdkit-filter.h> @@ -222,6 +223,40 @@ partition_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->zero (nxdata, count, offs + h->offset, flags, err); } +/* Extents. */ +static int +partition_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct handle *h = handle; + size_t i; + struct nbdkit_extents *extents2; + struct nbdkit_extent e; + + extents2 = nbdkit_extents_new (offs + h->offset); + if (extents2 == NULL) { + *err = errno; + return -1; + } + if (next_ops->extents (nxdata, count, offs + h->offset, + flags, extents2, err) == -1) + goto error; + + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + e.offset -= h->offset; + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) + goto error; + } + nbdkit_extents_free (extents2); + return 0; + + error: + nbdkit_extents_free (extents2); + return -1; +} + static struct nbdkit_filter filter = { .name = "partition", .longname = "nbdkit partition filter", @@ -237,6 +272,7 @@ static struct nbdkit_filter filter = { .pwrite = partition_pwrite, .trim = partition_trim, .zero = partition_zero, + .extents = partition_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 6/8] data, memory: Implement extents.
These plugins are both based on the same sparse array structure which supports a simple implementation of extents. --- common/sparse/sparse.h | 7 ++++++- common/sparse/sparse.c | 29 ++++++++++++++++++++++++++++- plugins/data/data.c | 16 +++++++++++++++- plugins/memory/memory.c | 16 +++++++++++++++- 4 files changed, 64 insertions(+), 4 deletions(-) diff --git a/common/sparse/sparse.h b/common/sparse/sparse.h index 818d804..eb24a0b 100644 --- a/common/sparse/sparse.h +++ b/common/sparse/sparse.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -87,4 +87,9 @@ extern void sparse_array_zero (struct sparse_array *sa, uint32_t count, uint64_t offset) __attribute__((__nonnull__ (1))); +/* Return information about allocated pages and holes. */ +extern int sparse_array_extents (struct sparse_array *sa, + uint32_t count, uint64_t offset, + struct nbdkit_extents *extents); + #endif /* NBDKIT_SPARSE_H */ diff --git a/common/sparse/sparse.c b/common/sparse/sparse.c index a5ace48..66b3007 100644 --- a/common/sparse/sparse.c +++ b/common/sparse/sparse.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -360,3 +360,30 @@ sparse_array_zero (struct sparse_array *sa, uint32_t count, uint64_t offset) offset += n; } } + +int +sparse_array_extents (struct sparse_array *sa, + uint32_t count, uint64_t offset, + struct nbdkit_extents *extents) +{ + uint32_t n, type; + void *p; + + while (count > 0) { + p = lookup (sa, offset, false, &n, NULL); + if (n > count) + n = count; + + if (p == NULL) + type = NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO; + else + type = 0; /* allocated data */ + if (nbdkit_add_extent (extents, offset, n, type) == -1) + return -1; + + count -= n; + offset += n; + } + + return 0; +} diff --git a/plugins/data/data.c b/plugins/data/data.c index f9d3881..c5540f6 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -378,6 +378,19 @@ data_trim (void *handle, uint32_t count, uint64_t offset) return 0; } +/* Extents. */ +static int +data_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + int r; + + pthread_mutex_lock (&lock); + r = sparse_array_extents (sa, count, offset, extents); + pthread_mutex_unlock (&lock); + return r; +} + static struct nbdkit_plugin plugin = { .name = "data", .version = PACKAGE_VERSION, @@ -394,6 +407,7 @@ static struct nbdkit_plugin plugin = { .pwrite = data_pwrite, .zero = data_zero, .trim = data_trim, + .extents = data_extents, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index e27e127..741eaad 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -170,6 +170,19 @@ memory_trim (void *handle, uint32_t count, uint64_t offset) return 0; } +/* Extents. */ +static int +memory_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + int r; + + pthread_mutex_lock (&lock); + r = sparse_array_extents (sa, count, offset, extents); + pthread_mutex_unlock (&lock); + return r; +} + static struct nbdkit_plugin plugin = { .name = "memory", .version = PACKAGE_VERSION, @@ -185,6 +198,7 @@ static struct nbdkit_plugin plugin = { .pwrite = memory_pwrite, .zero = memory_zero, .trim = memory_trim, + .extents = memory_extents, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 7/8] vddk: Implement extents.
This uses a new API VixDiskLib_QueryAllocatedBlocks provided in VDDK >= 6.7. Thanks: Martin Kletzander. --- plugins/vddk/vddk-structs.h | 15 +++- plugins/vddk/vddk.c | 138 ++++++++++++++++++++++++++++++++++++ 2 files changed, 152 insertions(+), 1 deletion(-) diff --git a/plugins/vddk/vddk-structs.h b/plugins/vddk/vddk-structs.h index dbed94a..df88322 100644 --- a/plugins/vddk/vddk-structs.h +++ b/plugins/vddk/vddk-structs.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -47,6 +47,9 @@ typedef uint64_t VixError; #define VIXDISKLIB_FLAG_OPEN_READ_ONLY 4 #define VIXDISKLIB_SECTOR_SIZE 512 +#define VIXDISKLIB_MIN_CHUNK_SIZE 128 +#define VIXDISKLIB_MAX_CHUNK_NUMBER (512*1024) + typedef void *VixDiskLibConnection; typedef void *VixDiskLibHandle; @@ -124,4 +127,14 @@ typedef struct VixDiskLibInfo { char *uuid; } VixDiskLibInfo; +typedef struct { + uint64_t offset; + uint64_t length; +} VixDiskLibBlock; + +typedef struct { + uint32_t numBlocks; + VixDiskLibBlock blocks[1]; +} VixDiskLibBlockList; + #endif /* NBDKIT_VDDK_STRUCTS_H */ diff --git a/plugins/vddk/vddk.c b/plugins/vddk/vddk.c index 56871cc..d056190 100644 --- a/plugins/vddk/vddk.c +++ b/plugins/vddk/vddk.c @@ -45,6 +45,8 @@ #include <nbdkit-plugin.h> #include "isaligned.h" +#include "minmax.h" +#include "rounding.h" #include "vddk-structs.h" @@ -67,6 +69,8 @@ static VixError (*VixDiskLib_GetInfo) (VixDiskLibHandle handle, VixDiskLibInfo * static void (*VixDiskLib_FreeInfo) (VixDiskLibInfo *info); static VixError (*VixDiskLib_Read) (VixDiskLibHandle handle, uint64_t start_sector, uint64_t nr_sectors, unsigned char *buf); static VixError (*VixDiskLib_Write) (VixDiskLibHandle handle, uint64_t start_sector, uint64_t nr_sectors, const unsigned char *buf); +static VixError (*VixDiskLib_QueryAllocatedBlocks) (VixDiskLibHandle diskHandle, uint64_t start_sector, uint64_t nr_sectors, uint64_t chunk_size, VixDiskLibBlockList **block_list); +static VixError (*VixDiskLib_FreeBlockList) (VixDiskLibBlockList *block_list); /* Parameters passed to InitEx. */ #define VDDK_MAJOR 5 @@ -174,6 +178,11 @@ vddk_load (void) VixDiskLib_FreeInfo = dlsym (dl, "VixDiskLib_FreeInfo"); VixDiskLib_Read = dlsym (dl, "VixDiskLib_Read"); VixDiskLib_Write = dlsym (dl, "VixDiskLib_Write"); + + /* Added in VDDK 6.7, these will be NULL for earlier versions: */ + VixDiskLib_QueryAllocatedBlocks + dlsym (dl, "VixDiskLib_QueryAllocatedBlocks"); + VixDiskLib_FreeBlockList = dlsym (dl, "VixDiskLib_FreeBlockList"); } static void @@ -570,6 +579,133 @@ vddk_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) return 0; } +static int +vddk_can_extents (void *handle) +{ + struct vddk_handle *h = handle; + VixError err; + VixDiskLibBlockList *block_list; + + /* This call was added in VDDK 6.7. In earlier versions the + * function pointer will be NULL and we cannot query extents. + */ + if (VixDiskLib_QueryAllocatedBlocks == NULL) { + nbdkit_debug ("can_extents: VixDiskLib_QueryAllocatedBlocks == NULL, " + "probably this is VDDK < 6.7"); + return 0; + } + + /* However even when the call is available it rarely works well so + * the best thing we can do here is to try the call and if it's + * non-functional return false. + */ + DEBUG_CALL ("VixDiskLib_QueryAllocatedBlocks", + "handle, 0, %d sectors, %d sectors", + VIXDISKLIB_MIN_CHUNK_SIZE, VIXDISKLIB_MIN_CHUNK_SIZE); + err = VixDiskLib_QueryAllocatedBlocks (h->handle, + 0, VIXDISKLIB_MIN_CHUNK_SIZE, + VIXDISKLIB_MIN_CHUNK_SIZE, + &block_list); + if (err == VIX_OK) { + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); + VixDiskLib_FreeBlockList (block_list); + } + if (err != VIX_OK) { + char *errmsg = VixDiskLib_GetErrorText (err, NULL); + nbdkit_debug ("can_extents: VixDiskLib_QueryAllocatedBlocks test failed, " + "extents support will be disabled: " + "original error: %s", + errmsg); + VixDiskLib_FreeErrorText (errmsg); + return 0; + } + + return 1; +} + +static int +vddk_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents) +{ + struct vddk_handle *h = handle; + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + uint64_t position, end, start_sector; + + position = offset; + + /* We can only query whole chunks. Therefore start with the first + * chunk before offset. + */ + end = offset + count; + start_sector + ROUND_DOWN (offset, VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE) + / VIXDISKLIB_SECTOR_SIZE; + while (start_sector * VIXDISKLIB_SECTOR_SIZE < end) { + VixError err; + uint32_t i; + uint64_t nr_chunks, nr_sectors; + VixDiskLibBlockList *block_list; + + assert (IS_ALIGNED (start_sector, VIXDISKLIB_MIN_CHUNK_SIZE)); + + nr_chunks + ROUND_UP (end - start_sector * VIXDISKLIB_SECTOR_SIZE, + VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE) + / (VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE); + nr_chunks = MIN (nr_chunks, VIXDISKLIB_MAX_CHUNK_NUMBER); + nr_sectors = nr_chunks * VIXDISKLIB_MIN_CHUNK_SIZE; + + DEBUG_CALL ("VixDiskLib_QueryAllocatedBlocks", + "handle, %" PRIu64 " sectors, %" PRIu64 " sectors, " + "%d sectors", + start_sector, nr_sectors, VIXDISKLIB_MIN_CHUNK_SIZE); + err = VixDiskLib_QueryAllocatedBlocks (h->handle, + start_sector, nr_sectors, + VIXDISKLIB_MIN_CHUNK_SIZE, + &block_list); + if (err != VIX_OK) { + VDDK_ERROR (err, "VixDiskLib_QueryAllocatedBlocks"); + return -1; + } + + for (i = 0; i < block_list->numBlocks; ++i) { + uint64_t offset, length; + + offset = block_list->blocks[i].offset * VIXDISKLIB_SECTOR_SIZE; + length = block_list->blocks[i].length * VIXDISKLIB_SECTOR_SIZE; + + /* The query returns blocks. We must insert holes between the + * blocks as necessary. + */ + if (position < offset) { + if (nbdkit_add_extent (extents, + offset, length, + NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO) == -1) + goto error_in_add; + } + + if (nbdkit_add_extent (extents, + offset, length, 0 /* allocated data */) == -1) { + error_in_add: + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); + VixDiskLib_FreeBlockList (block_list); + return -1; + } + + position = offset + length; + } + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); + VixDiskLib_FreeBlockList (block_list); + + start_sector += nr_sectors; + + if (req_one) + break; + } + + return 0; +} + static struct nbdkit_plugin plugin = { .name = "vddk", .longname = "VMware VDDK plugin", @@ -585,6 +721,8 @@ static struct nbdkit_plugin plugin = { .get_size = vddk_get_size, .pread = vddk_pread, .pwrite = vddk_pwrite, + .can_extents = vddk_can_extents, + .extents = vddk_extents, }; NBDKIT_REGISTER_PLUGIN(plugin) -- 2.20.1
Richard W.M. Jones
2019-Mar-20 22:11 UTC
[Libguestfs] [PATCH nbdkit 8/8] file: Implement extents.
This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and holes in the underlying file. --- plugins/file/file.c | 139 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 129 insertions(+), 10 deletions(-) diff --git a/plugins/file/file.c b/plugins/file/file.c index 628f8fb..22fdfcd 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -37,6 +37,7 @@ #include <stdlib.h> #include <stdbool.h> #include <string.h> +#include <inttypes.h> #include <fcntl.h> #include <unistd.h> #include <sys/types.h> @@ -44,6 +45,8 @@ #include <sys/ioctl.h> #include <errno.h> +#include <pthread.h> + #if defined(__linux__) && !defined(FALLOC_FL_PUNCH_HOLE) #include <linux/falloc.h> /* For FALLOC_FL_*, glibc < 2.18 */ #endif @@ -68,7 +71,11 @@ static char *filename = NULL; -int file_debug_zero; /* to enable: -D file.zero=1 */ +/* Any callbacks using lseek must be protected by this lock. */ +static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER; + +/* to enable: -D file.zero=1 */ +int file_debug_zero; static void file_unload (void) @@ -220,6 +227,21 @@ file_close (void *handle) #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL +/* For block devices, stat->st_size is not the true size. */ +static int64_t +block_device_size (int fd) +{ + off_t size; + + size = lseek (fd, 0, SEEK_END); + if (size == -1) { + nbdkit_error ("lseek (to find device size): %m"); + return -1; + } + + return size; +} + /* Get the file size. */ static int64_t file_get_size (void *handle) @@ -227,15 +249,11 @@ file_get_size (void *handle) struct handle *h = handle; if (h->is_block_device) { - /* Block device, so st_size will not be the true size. */ - off_t size; - - size = lseek (h->fd, 0, SEEK_END); - if (size == -1) { - nbdkit_error ("lseek (to find device size): %m"); - return -1; - } + int64_t size; + pthread_mutex_lock (&lseek_lock); + size = block_device_size (h->fd); + pthread_mutex_unlock (&lseek_lock); return size; } else { /* Regular file. */ @@ -501,6 +519,103 @@ file_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) return 0; } +#ifdef SEEK_HOLE +/* Extents. */ + +static int +file_can_extents (void *handle) +{ + struct handle *h = handle; + off_t r; + + /* A simple test to see whether SEEK_HOLE etc is likely to work on + * the current filesystem. + */ + pthread_mutex_lock (&lseek_lock); + r = lseek (h->fd, 0, SEEK_HOLE); + pthread_mutex_unlock (&lseek_lock); + if (r == -1) { + nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m"); + return 0; + } + return 1; +} + +static int +do_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + struct handle *h = handle; + const bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + uint64_t end = offset + count; + + do { + off_t pos; + + pos = lseek (h->fd, offset, SEEK_DATA); + if (pos == -1) { + if (errno == ENXIO) { + /* The current man page does not describe this situation well, + * but a proposed change to POSIX adds these words for ENXIO: + * "or the whence argument is SEEK_DATA and the offset falls + * within the final hole of the file." + */ + pos = end; + } + else { + nbdkit_error ("lseek: SEEK_DATA: %" PRIu64 ": %m", offset); + return -1; + } + } + + /* We know there is a hole from offset to pos-1. */ + if (pos > offset) { + if (nbdkit_add_extent (extents, offset, pos - offset, + NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO) == -1) + return -1; + if (req_one) + break; + } + + offset = pos; + if (offset >= end) + break; + + pos = lseek (h->fd, offset, SEEK_HOLE); + if (pos == -1) { + nbdkit_error ("lseek: SEEK_HOLE: %" PRIu64 ": %m", offset); + return -1; + } + + /* We know there is data from offset to pos-1. */ + if (pos > offset) { + if (nbdkit_add_extent (extents, offset, pos - offset, + 0 /* allocated data */) == -1) + return -1; + if (req_one) + break; + } + + offset = pos; + } while (offset < end); + + return 0; +} + +static int +file_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + int r; + + pthread_mutex_lock (&lseek_lock); + r = do_extents (handle, count, offset, flags, extents); + pthread_mutex_unlock (&lseek_lock); + + return r; +} +#endif /* SEEK_HOLE */ + static struct nbdkit_plugin plugin = { .name = "file", .longname = "nbdkit file plugin", @@ -522,6 +637,10 @@ static struct nbdkit_plugin plugin = { .flush = file_flush, .trim = file_trim, .zero = file_zero, +#ifdef SEEK_HOLE + .can_extents = file_can_extents, + .extents = file_extents, +#endif .errno_is_preserved = 1, }; -- 2.20.1
Eric Blake
2019-Mar-23 16:25 UTC
Re: [Libguestfs] [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> This pair of calls allows plugins to describe which extents in the > virtual disk are allocated, holes or zeroes. > ---I see you've already made a couple of tweaks to your block-status branch since posting this (skipping 0-legnth add_extent, and coalescing consecutive add_extents with the same type), but here's some more review:> +++ b/docs/nbdkit-filter.pod> +It is also possible for filters to transform the extents list received > +back from the layer below. Without error checking it would look like > +this: > + > + myfilter_extents (...) > + { > + size_t i; > + struct nbdkit_extents *extents2; > + struct nbdkit_extent e; > + > + extents2 = nbdkit_extents_new (offset); > + next_ops->extents (nxdata, count, offset, flags, extents2, err); > + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { > + e = nbdkit_get_extent (extents2, i); > + e.offset = /* transform offset */; > + nbdkit_add_extent (extents, e.offset, e.length, e.type); > + } > + nbdkit_extents_free (extents2); > + } > + > +More complicated transformations may require the filter to allocate a > +new extents list using C<nbdkit_extents_new> and free the old one > +using C<nbdkit_extents_free>.This sentence sounds odd after the example. You'll probably want to reword this a bit. I see that you chose to track an extent per 'struct nbdkit_extent' rather than leaving it implicit based solely on what struct nbdkit_extents tracks, which means that the offset filter has to allocate a new one to pass to the plugin, AND has to adjust each extent before returning. Conceptually, you have something like: extents (starts at 1000): [ { offset=1000, length=100, type=0 }, { offset=1100, length=100, type=1 }, ... ] and since we allow plugins to add extents prior to the starting offset (as a no-op), the offset filter has to create a new filter with a new starting offset. As a thought experiment, I wonder if it would be any easier to have 'struct nbdkit_extents' track merely a next-expected offset, as well as a cumulative length, and then omit the offset from 'struct nbdkit_extent'; also, give filters a way to adjust the next expected offset. Then you could do something like: offsetfilter_extents (handle, offset=1000, count=500) { // extents has length=0, next=1000, array=[ ] // implied start=1000 nbdkit_extents_adjust_next (extents, h->offs); // extents has length=0, next=1000+h->offs, array=[ ] // implied start=1000+h->offs ret = next_ops->extents (nxdata, count, offsest + h->offs, flags, extents, err); // client populated extents: length=500, next=1500+h->offs, array=[ // { length=100, type=0 }, // offset would compute as 1000+h->offs // { length=100, type=1 }, // offset would compute as 1100+h->offs // ... // ], implied start=1000+h->offs nbdkit_extents_adjust_next (extents, -h->offs); // extents length=500, next=1500, array=[ // { length=100, type=0 }, // offset would compute as 1000 // { length=100, type=1 }, // offset would compute as 1100 // ... // ], implied start=1000 return ret; } Doing that may also make it easier to do write a filter that mixes its own holes with the underlying status of the plugin: // append hole owned by filter: nbdkit_extent_add(extents, hole_length, offset, hole) // change expected next offset for plugin int64_t delta = nbdkit_extents_adjust_next (extents, 0) // let plugin append its extents next_ops->extents (nxdata, count - hole_length, 0, flags, extents, err) // adjust things back to normal nbdkit_extents_adjust_next (extents, delta) Having typed that, it may require less computation time, but requires more thought about how it actually all works, so keeping your implementation (where the filter HAS to allocate a new extents, then copy the plugins answers back to its own, rather than being able to blindly reuse the plugins answers in place) is okay. As I said, mine was just a thought experiment for comparison, so it doesn't bother me if you don't use it.> + > +If there is an error, C<.extents> should call C<nbdkit_error> with an > +error message B<and> return -1 with C<err> set to the positive errno > +value to return to the client. > + > +=head3 Allocating and freeing nbdkit_extents list > + > +Two functions are provided to filters only for allocating and freeing > +the map: > + > + struct nbdkit_extents *nbdkit_extents_new (uint64_t start); > + > +Allocates and returns a new, empty extents list. The C<start> > +parameter is the start of the range described in the list. Normally > +you would pass in C<offset> as this parameter.Well, the offset that the plugin should start at (which might not be your C<offset>, if your filter is adjusting the plugin's offsets)> + > +On error this function can return C<NULL>. In this case it calls > +C<nbdkit_error> and/or C<nbdkit_set_error> as required. > + > + void nbdkit_extents_free (struct nbdkit_extents *); > + > +Frees an existing extents list. > + > +=head3 Iterating over nbdkit_extents list > + > +Two functions are provided to filters only to iterate over the extents > +in order: > + > + size_t nbdkit_extents_size (const struct nbdkit_extents *); > + > +Returns the number of extents in the list. > + > + struct nbdkit_extent { > + uint64_t offset; > + uint64_t length; > + uint32_t type; > + }; > + struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *, > + size_t i); > + > +Returns a copy of the C<i>'th extent.If you go with my thought experiment, it gets harder to provide this signature. This signature is certainly easier than the foreach() iterator with callback function of your previous version, though, so I do like how enforcing the client to work in ascending contiguous order simplified things.> + > =head1 ERROR HANDLING > > If there is an error in the filter itself, the filter should call > diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod > index 47545f3..a603f8b 100644 > --- a/docs/nbdkit-plugin.pod > +++ b/docs/nbdkit-plugin.pod > @@ -555,6 +555,20 @@ This callback is not required. If omitted, then nbdkit always tries > C<.zero> first if it is present, and gracefully falls back to > C<.pwrite> if C<.zero> was absent or failed with C<EOPNOTSUPP>. > > +=head2 C<.can_extents> > + > + int can_extents (void *handle); > + > +This is called during the option negotiation phase to find out if the > +plugin supports detecting allocated (non-sparse) regions of the disk > +with the C<.extents> callback. > + > +If there is an error, C<.can_extents> should call C<nbdkit_error> with > +an error message and return C<-1>. > + > +This callback is not required. If omitted, then we return true iff a > +C<.extents> callback has been defined.Should we default this to returning true always, _because_ we have the sane fallback of treating the entire image as allocated when .extents is absent? (That is, a plugin has to explicitly opt out of advertising NBD_CMD_BLOCK_STATUS support, because our default works well enough even if the plugin didn't do anything). It would match how filters can call next_ops->zero() even for plugins that do not have .zero (because .can_zero defaults to true).> +> +Extents overlapping the range C<[offset...offset+count-1]> should be > +returned if possible. However nbdkit ignores extents E<lt> offset so > +the plugin may, if it is easier to implement, return all extent > +information for the whole disk. The plugin may return extents beyond > +the end of the range. It may also return extent information for less > +than the whole range, but it must return at least one extent.must return at least one extent overlapping with C<[offset]>. (if the plugin returns all extent information prior to offset, but stops before offset is reached, that counts as not returning any extents).> + > +The extents B<must> be added in ascending order, and B<must> be > +contiguous. > + > +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> > +which means that the client is only requesting information about the > +extent overlapping C<offset>. The plugin may ignore this flag, or as > +an optimization it may return just a single extent for C<offset>. > +(Note that implementing this optimization correctly is difficult and > +subtle. You must ensure that the upper limit of the single extent > +returned is correct. If unsure then the safest option is to ignore > +this flag.)I'm not sure we still need this parenthetical - with your simpler implementation, I think it is harder for plugins to get things wrong (the NBD protocol does NOT require the upper limit to be accurate, merely that it makes progress - although the NBD protocol recommends that servers return maximal upper bounds, it also warns clients that they must be prepared to coalesce short returns with identical type).> +++ b/include/nbdkit-filter.h > @@ -1,5 +1,5 @@ > /* nbdkit > - * Copyright (C) 2013-2018 Red Hat Inc. > + * Copyright (C) 2013-2019 Red Hat Inc. > * All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > @@ -44,6 +44,18 @@ extern "C" { > > #define NBDKIT_FILTER_API_VERSION 2 > > +struct nbdkit_extent { > + uint64_t offset; > + uint64_t length; > + uint32_t type; > +}; > + > +extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start); > +extern void nbdkit_extents_free (struct nbdkit_extents *); > +extern size_t nbdkit_extents_size (const struct nbdkit_extents *);'size' makes me wonder if this is the number of bytes covered by the extents object, rather than the number of extent objects in the array. Would calling it 'nbdkit_extents_count' be any easier to understand?> +++ b/server/extents.c > @@ -0,0 +1,171 @@> +struct nbdkit_extents { > + struct nbdkit_extent *extents; > + size_t nr_extents, allocated; > + uint64_t start; > +};Is it also worth tracking cumulative length covered by extents? Tracking it would give us an O(1) instead of O(n) answer to "how many bytes did the plugin tell us about" - which MAY matter to filters that want to append additional data about a hole after the underlying plugin (so the filter can tell if the plugin filled in the entire range that the filter requested, or if it filled in only a subset). But then again, if the filter has to allocate a new map to pass to the plugin, and then postprocess to copy from the temporary map back to the filter's incoming map, the filter is already doing O(n) work and can answer the question itself. (My thought proposal was a way to let such a filter do O(1) work by letting the client append directly into the same list that the filter will also be appending to - but I'm not sure it is necessary for the complexity it introduces). Is it worth tracking a maximum that the client asked for? When REQ_ONE is in effect, the maximum is a hard stop for what we answer to the client (but we should still let the client give more than we asked for); when REQ_ONE is not in effect, the protocol says that we can't add any more extents of a new type (we can only coalesce a larger end length to the existing last extent). Either way, tracking a maximum means that we could avoid wasting time and malloc()s on extents where the plugin gave us more information than we need, for any nbdkit_add_extent() with an offset beyond the maximum that does not coalesce with the existing final extent.> + > +size_t > +nbdkit_extents_size (const struct nbdkit_extents *exts) > +{ > + return exts->nr_extents; > +}Again, would nbdkit_extents_count() be a nicer name for this one?> +int > +nbdkit_add_extent (struct nbdkit_extents *exts, > + uint64_t offset, uint64_t length, uint32_t type) > +{ > + uint64_t overlap; > + > + /* If there are existing extents, the new extent must be contiguous. */or else ignored because it is beyond a maximum, if you like my idea of tracking the maximum the client cared about. Here's also where you added code to ignore 0-length plugin calls.> + > + const struct nbdkit_extent e > + { .offset = offset, .length = length, .type = type }; > + return append_extent (exts, &e);and here is where you added in code to coalesce plugin calls that returned the same type.> +++ b/server/plugins.c > @@ -1,5 +1,5 @@> > +static int > +plugin_can_extents (struct backend *b, struct connection *conn) > +{ > + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); > + > + assert (connection_get_handle (conn, 0)); > + > + debug ("can_extents"); > + > + if (p->plugin.can_extents) > + return p->plugin.can_extents (connection_get_handle (conn, 0)); > + else > + return p->plugin.extents != NULL;Again, making this default to true may be nicer, where a plugin has to opt out of our sane default behavior. We're getting closer. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-23 16:51 UTC
Re: [Libguestfs] [PATCH nbdkit 3/8] server: Implement Block Status requests to read allocation status.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> This commit implements the core NBD protocol for the "base:allocation" > Block Status replies. > --- > server/internal.h | 7 ++ > server/protocol.h | 15 +++ > server/protocol-handshake-newstyle.c | 81 ++++++++++++++- > server/protocol.c | 141 ++++++++++++++++++++++++--- > 4 files changed, 229 insertions(+), 15 deletions(-) >> +static int > +send_structured_reply_block_status (struct connection *conn, > + uint64_t handle, > + uint16_t cmd, uint16_t flags, > + uint32_t count, uint64_t offset, > + struct nbdkit_extents *extents) > +{ > + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock); > + struct structured_reply reply; > + size_t nr_extents = nbdkit_extents_size (extents); > + uint32_t context_id; > + size_t i; > + int r; > + > + assert (cmd == NBD_CMD_BLOCK_STATUS); > +Worth asserting conn->meta_context_base_allocation?> + > + /* Send each block descriptor. */ > + for (i = 0; i < nr_extents; ++i) {Where does the list terminate after 1 extent if REQ_ONE was set? Also, if REQ_ONE is set but the plugin provided coalesced status beyond the request, this would need to clamp the answer to the requested length.> + const struct nbdkit_extent e = nbdkit_get_extent (extents, i); > + struct block_descriptor bd; > + > + if (i == 0) > + assert (e.offset == offset); > + > + bd.length = htobe32 (e.length); > + bd.status_flags = htobe32 (e.type & 3); > + > + r = conn->send (conn, &bd, sizeof bd); > + if (r == -1) { > + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); > + return connection_set_status (conn, -1); > + } > + }Where does the list terminate once you send the final extent that overlaps the end of the original request? (If you implement my idea of tracking a maximum offset in nbdkit_extents in patch 1, this may be picked up automatically)> @@ -498,15 +605,23 @@ protocol_recv_request_send_reply (struct connection *conn) > } > > /* Currently we prefer to send simple replies for everything except > - * where we have to (ie. NBD_CMD_READ when structured_replies have > - * been negotiated). However this prevents us from sending > - * human-readable error messages to the client, so we should > - * reconsider this in future. > + * where we have to (ie. NBD_CMD_READ and NBD_CMD_BLOCK_STATUS when > + * structured_replies have been negotiated). However this prevents > + * us from sending human-readable error messages to the client, so > + * we should reconsider this in future. > */ > - if (conn->structured_replies && cmd == NBD_CMD_READ) { > - if (!error) > - return send_structured_reply_read (conn, request.handle, cmd, > - buf, count, offset); > + if (conn->structured_replies && > + (cmd == NBD_CMD_READ || cmd == NBD_CMD_BLOCK_STATUS)) { > + if (!error) { > + if (cmd == NBD_CMD_READ) > + return send_structured_reply_read (conn, request.handle, cmd, > + buf, count, offset); > + else /* NBD_CMD_BLOCK_STATUS */ > + return send_structured_reply_block_status (conn, request.handle, > + cmd, flags, > + count, offset, > + extents); > + } > else > return send_structured_reply_error (conn, request.handle, cmd, error);Ah, so you are already sending a structured error, even though the protocol didn't require it (which is the qemu bug I just sent a patch for today). Technically, the protocol requires a structured error for NBD_CMD_READ, but permits a simple error for NBD_CMD_BLOCK_STATUS; but portability wise, since qemu 2.12 through 3.1 choke on the simple error, this works around those versions of qemu. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-23 16:58 UTC
Re: [Libguestfs] [PATCH nbdkit 4/8] offset: Implement mapping of extents.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> Allows you to safely use nbdkit-offset-filter on top of a plugin > supporting extents. > --- > filters/offset/offset.c | 35 +++++++++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+)If we don't change our minds on the interface in patch 1, then this looks correct.> +/* Extents. */ > +static int > +offset_extents (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle, uint32_t count, uint64_t offs, uint32_t flags, > + struct nbdkit_extents *extents, int *err) > +{ > + size_t i; > + struct nbdkit_extents *extents2; > + struct nbdkit_extent e; > + > + extents2 = nbdkit_extents_new (offs + offset); > + if (extents2 == NULL) { > + *err = errno; > + return -1; > + }Ouch - nbdkit_extents_new() returns NULL for 'start >= INT64_MAX' without setting errno. Of course, that failure path should be unreachable here, but if you are going to assign *err based on errno, then we have to make sure patch 1 guarantees sane errno settings on all failure paths. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-23 17:01 UTC
Re: [Libguestfs] [PATCH nbdkit 5/8] partition: Implement mapping of extents.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> --- > filters/partition/partition.c | 36 +++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+)Looks the same as offset. Style question:> + if (next_ops->extents (nxdata, count, offs + h->offset, > + flags, extents2, err) == -1) > + goto error; > + > + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { > + e = nbdkit_get_extent (extents2, i); > + e.offset -= h->offset; > + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1)Would it be any simpler to avoid the -= above, and just call add_extent(extents, e.offset - h->offset, e.length, e.type) (If you do it in one patch, do it in both) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-23 17:05 UTC
Re: [Libguestfs] [PATCH nbdkit 6/8] data, memory: Implement extents.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> These plugins are both based on the same sparse array structure which > supports a simple implementation of extents. > ---> +int > +sparse_array_extents (struct sparse_array *sa, > + uint32_t count, uint64_t offset, > + struct nbdkit_extents *extents) > +{ > + uint32_t n, type; > + void *p; > + > + while (count > 0) { > + p = lookup (sa, offset, false, &n, NULL); > + if (n > count) > + n = count;Why are we clamping the information we return? I'd move this clamp...> + > + if (p == NULL) > + type = NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO; > + else > + type = 0; /* allocated data */ > + if (nbdkit_add_extent (extents, offset, n, type) == -1) > + return -1;...here, after we've recorded the maximum amount of information possible into the extents array. (We need the clamp to terminate the loop, but that doesn't mean we have to truncate our answer)> + > + count -= n; > + offset += n; > + } > + > + return 0; > +}Otherwise, looks good. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-23 19:29 UTC
Re: [Libguestfs] [PATCH nbdkit 7/8] vddk: Implement extents.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> This uses a new API VixDiskLib_QueryAllocatedBlocks provided in > VDDK >= 6.7. > > Thanks: Martin Kletzander. > --- > plugins/vddk/vddk-structs.h | 15 +++- > plugins/vddk/vddk.c | 138 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 152 insertions(+), 1 deletion(-) >> +static int > +vddk_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, > + struct nbdkit_extents *extents) > +{ > + struct vddk_handle *h = handle; > + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; > + uint64_t position, end, start_sector; > + > + position = offset; > +...> + > + for (i = 0; i < block_list->numBlocks; ++i) { > + uint64_t offset, length; > + > + offset = block_list->blocks[i].offset * VIXDISKLIB_SECTOR_SIZE; > + length = block_list->blocks[i].length * VIXDISKLIB_SECTOR_SIZE; > + > + /* The query returns blocks. We must insert holes between the > + * blocks as necessary. > + */ > + if (position < offset) { > + if (nbdkit_add_extent (extents, > + offset, length, > + NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO) == -1) > + goto error_in_add; > + } > + > + if (nbdkit_add_extent (extents, > + offset, length, 0 /* allocated data */) == -1) { > + error_in_add: > + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); > + VixDiskLib_FreeBlockList (block_list); > + return -1; > + } > + > + position = offset + length;This inner loop might be long; you could add this: if (req_one) break;> + } > + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); > + VixDiskLib_FreeBlockList (block_list); > + > + start_sector += nr_sectors; > + > + if (req_one) > + break;just as you break the outer loop here, to make REQ_ONE slightly faster. Otherwise looks sane. m not in a position to test it, but if 'qemu-img map --output=json nbd://...' pointing to your server matches what you expect for vddk, that's a good sign. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-23 19:45 UTC
Re: [Libguestfs] [PATCH nbdkit 8/8] file: Implement extents.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and > holes in the underlying file. > --- > plugins/file/file.c | 139 ++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 129 insertions(+), 10 deletions(-) >> -int file_debug_zero; /* to enable: -D file.zero=1 */ > +/* Any callbacks using lseek must be protected by this lock. */ > +static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER; > + > +/* to enable: -D file.zero=1 */ > +int file_debug_zero; > > static void > file_unload (void) > @@ -220,6 +227,21 @@ file_close (void *handle) > > #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL > > +/* For block devices, stat->st_size is not the true size. */ > +static int64_t > +block_device_size (int fd) > +{ > + off_t size; > + > + size = lseek (fd, 0, SEEK_END);Calling lseek without the lock? I'm not sure if you can guarantee that .size won't be called in parallel with some other .extents. /me reads ahead Oh, the caller has the lock. I might add a comment to this function that it expects the caller to grab the lock.> > +#ifdef SEEK_HOLE > +/* Extents. */ > + > +static int > +file_can_extents (void *handle) > +{ > + struct handle *h = handle; > + off_t r; > + > + /* A simple test to see whether SEEK_HOLE etc is likely to work on > + * the current filesystem. > + */ > + pthread_mutex_lock (&lseek_lock); > + r = lseek (h->fd, 0, SEEK_HOLE); > + pthread_mutex_unlock (&lseek_lock); > + if (r == -1) { > + nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m"); > + return 0;In fact, if lseek() succeeds but returns the current file size, you know there are no holes in the file at all, in which case, it might be more efficient to store a variable stating that .extents should avoid lseek() altogether and just report the entire file as data. Hmm - if we just make .extents return false in this case, then nbdkit won't advertise block status to the guest even though we have valid status available (knowing there are no holes is nicer than merely assuming there are no holes because block status was unavailable). But if we return true, then WE have to do that optimization ourselves. Maybe .can_extent should be a tri-state return, just like .can_fua, where a plugin can choose NBDKIT_EXTENT_NONE to force no block status advertisement, NBDKIT_EXTENT_EMULATE (default if .extents is missing) to let nbdkit do the work of reporting the entire disk as data without calling .extents, and NBDKIT_EXTENT_NATIVE (requires .extents, and lets the plugin do all the work). Then, when lseek(SEEK_HOLE) returns EOF, we can return NBDKIT_EXTENT_EMULATE instead of having to track our optimization of skipping lseek in .extents ourselves.> +static int > +file_extents (void *handle, uint32_t count, uint64_t offset, > + uint32_t flags, struct nbdkit_extents *extents) > +{ > + int r; > + > + pthread_mutex_lock (&lseek_lock); > + r = do_extents (handle, count, offset, flags, extents); > + pthread_mutex_unlock (&lseek_lock); > +But this would be the spot where we could optimize by returning all data when our initial lseek() probe proved the file is not sparse. Otherwise, looks good. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [PATCH nbdkit v4 00/15] Implement Block Status.
- [PATCH nbdkit v5 FINAL 00/19] Implement extents.
- Re: [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
- [RFC nbdkit PATCH 0/3] aligned .extents
- [PATCH nbdkit 2/2] filters: Be careful to set *err if nbdkit_add_extent or nbdkit_extents_new fail.