Richard W.M. Jones
2019-Mar-12 17:11 UTC
[Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
Second version based on nbdkit_extent* calls, as discussed here: https://www.redhat.com/archives/libguestfs/2019-March/msg00033.html Note in particular there is some subtlety about how filters would work in this implementation. See docs/nbdkit-filter.pod for the details. Rich.
Richard W.M. Jones
2019-Mar-12 17:11 UTC
[Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
This pair of calls allows plugins to describe which extents in the virtual disk are allocated, holes or zeroes. --- docs/nbdkit-filter.pod | 45 +++++++++++++++++ docs/nbdkit-plugin.pod | 106 ++++++++++++++++++++++++++++++++++++++++ include/nbdkit-common.h | 18 ++++++- include/nbdkit-filter.h | 12 ++++- include/nbdkit-plugin.h | 6 ++- server/internal.h | 7 ++- server/extents.c | 67 +++++++++++++++++++++++++ server/filters.c | 61 +++++++++++++++++++++++ server/plugins.c | 63 +++++++++++++++++++++++- server/Makefile.am | 1 + 10 files changed, 381 insertions(+), 5 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index dc9a262..8bfd151 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -350,6 +350,8 @@ calls. =head2 C<.can_zero> +=head2 C<.can_extents> + =head2 C<.can_fua> =head2 C<.can_multi_conn> @@ -365,6 +367,8 @@ calls. void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -513,6 +517,47 @@ value to return to the client. The filter should never fail with C<EOPNOTSUPP> (while plugins have automatic fallback to C<.pwrite>, filters do not). +=head2 C<.extents> + + int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err); + +This intercepts the plugin C<.extents> method and can be used to +modify extent requests. + +This function will not be called if C<.can_extents> returned false; in +turn, the filter should not call C<next_ops-E<gt>extents> if +C<next_ops-E<gt>can_extents> did not return true. + +The C<extents_map> parameter passed to this function is empty. If the +filter does not need to adjust extents from the underlying plugin it +can simply pass this through to the layer below: + + myfilter_extents (...) + { + return next_ops->extents (nxdata, count, offset, flags, extents_map, err); + } + +It is also possible for filters to transform the extents map received +back from the layer below. Usually this must be done by allocating a +new map which is passed to the layer below. Without error handling it +would look like this: + + myfilter_extents (...) + { + struct nbdkit_extents_map *map2 = nbdkit_extents_new (); + next_ops->extents (nxdata, count, offset, flags, map2, err); + /* transform map2 and return results in extents_map */ + nbdkit_extents_foreach (map2, transform_offset, extents_map); + nbdkit_extents_free (map2); + } + +If there is an error, C<.extents> should call C<nbdkit_error> with an +error message B<and> return -1 with C<err> set to the positive errno +value to return to the client. + =head1 ERROR HANDLING If there is an error in the filter itself, the filter should call diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index 47545f3..0183502 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -555,6 +555,20 @@ This callback is not required. If omitted, then nbdkit always tries C<.zero> first if it is present, and gracefully falls back to C<.pwrite> if C<.zero> was absent or failed with C<EOPNOTSUPP>. +=head2 C<.can_extents> + + int can_extents (void *handle); + +This is called during the option negotiation phase to find out if the +plugin supports detecting allocated (non-sparse) regions of the disk +with the C<.extents> callback. + +If there is an error, C<.can_extents> should call C<nbdkit_error> with +an error message and return C<-1>. + +This callback is not required. If omitted, then we return true iff a +C<.extents> callback has been defined. + =head2 C<.can_fua> int can_fua (void *handle); @@ -717,6 +731,98 @@ If there is an error, C<.zero> should call C<nbdkit_error> with an error message, and C<nbdkit_set_error> to record an appropriate error (unless C<errno> is sufficient), then return C<-1>. +=head2 C<.extents> + + int extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, + struct nbdkit_extents_map *extents_map); + +During the data serving phase, this callback is used to +detect allocated (non-sparse) regions of the disk. + +This function will not be called if C<.can_extents> returned false. + +The callback should detect and return the list of extents overlapping +the range C<[offset...offset+count-1]>. The C<extents_map> parameter +points to an opaque object which the callback should fill in by +calling C<nbdkit_extent_add> etc. See L</Extents map> below. + +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> +which means that the client is only requesting information about the +extent overlapping C<offset>. The plugin may ignore this flag, or as +an optimization it may return just a single extent for C<offset>. + +If there is an error, C<.extents> should call C<nbdkit_error> with an +error message, and C<nbdkit_set_error> to record an appropriate error +(unless C<errno> is sufficient), then return C<-1>. + +=head3 Extents map + +The plugin C<extents> callback is passed an opaque pointer C<struct +nbdkit_extents_map *extents_map>. This structure represents a list of +L<filesystem extents|https://en.wikipedia.org/wiki/Extent_(file_systems)> +describing which areas of the disk are allocated, which are sparse +(“holes”), and, if supported, which are zeroes. Initially the map +describes a single hole covering the entire virtual disk. + +The C<extents> callback has to fill in any allocated or zeroes areas +by calling C<nbdkit_extent*> functions, to describe all extents +overlapping the range C<[offset...offset+count-1]>. (nbdkit will +ignore extents outside this range so plugins may, if it is easier to +implement, return all extent information for the whole disk.) + +The following functions are available for plugins or filters to call: + + int nbdkit_extent_add (struct nbdkit_extents_map *extents_map, + uint64_t offset, uint64_t length, uint32_t type); + +Add an extent covering C<[offset...offset+length-1]> of one of +the following types: + +=over 4 + +=item NBDKIT_EXTENT_TYPE_HOLE + +An unallocated extent, a.k.a. a “hole”. + +=item NBDKIT_EXTENT_TYPE_DATA + +A normal extent containing data. + +=item NBDKIT_EXTENT_TYPE_ZERO + +An extent which is allocated and contains all zero bytes. + +=back + +If the new extent overlaps some previous extent in the map then the +overlapping part(s) are overwritten with the new type. Adjacent +extents of the same type may also be coalesced. Thus you shouldn't +assume that C<nbdkit_extent_add> always adds exactly one more extent +to the map. + +C<nbdkit_extent_add> returns C<0> on success or C<-1> on failure. On +failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been +called. + + void nbdkit_extents_clear (struct nbdkit_extents_map *extents_map); + +Clear the extents map. After this the map will have a single hole +covering the entire disk. + + int nbdkit_extents_foreach (const struct nbdkit_extents_map *extents_map, + int (*fn) (uint64_t offset, uint64_t length, + uint32_t type, void *opaque), + void *opaque); + +Iterate over the extents in order, calling C<fn> for each. C<fn> +should return 0 on success or -1 on failure. If a call to C<fn> fails +then C<nbdkit_extents_foreach> also fails immediately. + +C<nbdkit_extents_foreach> returns C<0> on success or C<-1> on failure. +However it does not call C<nbdkit_error> or C<nbdkit_set_error> - it +is expected that C<fn> will call those functions on failure. + =head1 THREADS Each nbdkit plugin must declare its thread safety model by defining diff --git a/include/nbdkit-common.h b/include/nbdkit-common.h index cb9954e..c6d5797 100644 --- a/include/nbdkit-common.h +++ b/include/nbdkit-common.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -60,11 +60,16 @@ extern "C" { #define NBDKIT_FLAG_MAY_TRIM (1<<0) /* Maps to !NBD_CMD_FLAG_NO_HOLE */ #define NBDKIT_FLAG_FUA (1<<1) /* Maps to NBD_CMD_FLAG_FUA */ +#define NBDKIT_FLAG_REQ_ONE (1<<2) /* Maps to NBD_CMD_FLAG_REQ_ONE */ #define NBDKIT_FUA_NONE 0 #define NBDKIT_FUA_EMULATE 1 #define NBDKIT_FUA_NATIVE 2 +#define NBDKIT_EXTENT_TYPE_HOLE 0 +#define NBDKIT_EXTENT_TYPE_DATA 1 +#define NBDKIT_EXTENT_TYPE_ZERO 2 + extern void nbdkit_error (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2); extern void nbdkit_verror (const char *msg, va_list args); extern void nbdkit_debug (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2); @@ -76,6 +81,17 @@ extern int nbdkit_parse_bool (const char *str); extern int nbdkit_read_password (const char *value, char **password); extern char *nbdkit_realpath (const char *path); +struct nbdkit_extents_map; +extern struct nbdkit_extents_map *nbdkit_extents_new (void); +extern void nbdkit_extents_free (struct nbdkit_extents_map *); +extern void nbdkit_extents_clear (struct nbdkit_extents_map *); +extern int nbdkit_extent_add (struct nbdkit_extents_map *, + uint64_t offset, uint64_t length, uint32_t type); +extern int nbdkit_extents_foreach (const struct nbdkit_extents_map *, + int (*fn) (uint64_t offset, uint64_t length, + uint32_t type, void *opaque), + void *opaque); + /* A static non-NULL pointer which can be used when you don't need a * per-connection handle. */ diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 71c06c8..b40fe2d 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -58,6 +58,7 @@ struct nbdkit_next_ops { int (*is_rotational) (void *nxdata); int (*can_trim) (void *nxdata); int (*can_zero) (void *nxdata); + int (*can_extents) (void *nxdata); int (*can_fua) (void *nxdata); int (*can_multi_conn) (void *nxdata); @@ -71,6 +72,9 @@ struct nbdkit_next_ops { int *err); int (*zero) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err); }; struct nbdkit_filter { @@ -120,6 +124,8 @@ struct nbdkit_filter { void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -140,6 +146,10 @@ struct nbdkit_filter { int (*zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err); }; #define NBDKIT_REGISTER_FILTER(filter) \ diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index d43b2f5..76cec54 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -125,6 +125,10 @@ struct nbdkit_plugin { const char *magic_config_key; int (*can_multi_conn) (void *handle); + + int (*can_extents) (void *handle); + int (*extents) (void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map); }; extern void nbdkit_set_error (int err); diff --git a/server/internal.h b/server/internal.h index 506882b..e05e5b5 100644 --- a/server/internal.h +++ b/server/internal.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -235,6 +235,7 @@ struct backend { int (*is_rotational) (struct backend *, struct connection *conn); int (*can_trim) (struct backend *, struct connection *conn); int (*can_zero) (struct backend *, struct connection *conn); + int (*can_extents) (struct backend *, struct connection *conn); int (*can_fua) (struct backend *, struct connection *conn); int (*can_multi_conn) (struct backend *, struct connection *conn); @@ -248,6 +249,10 @@ struct backend { uint64_t offset, uint32_t flags, int *err); int (*zero) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (struct backend *, struct connection *conn, uint32_t count, + uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err); }; /* plugins.c */ diff --git a/server/extents.c b/server/extents.c new file mode 100644 index 0000000..b4f2ef3 --- /dev/null +++ b/server/extents.c @@ -0,0 +1,67 @@ +/* nbdkit + * Copyright (C) 2019 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <inttypes.h> +#include <assert.h> + +#include "internal.h" + +struct nbdkit_extents_map { + // TBD XXX +}; + +#if 0 +struct nbdkit_extents_map *nbdkit_extents_new (void); +void nbdkit_extents_free (struct nbdkit_extents_map *); +void nbdkit_extents_clear (struct nbdkit_extents_map *); +#endif + +int nbdkit_extent_add (struct nbdkit_extents_map *extents_map, + uint64_t offset, uint64_t length, uint32_t type) +{ + // XXX + return 0; +} + +#if 0 +int nbdkit_extents_foreach (const struct nbdkit_extents_map *, + int (*fn) (uint64_t offset, uint64_t length, + uint32_t type, void *opaque), + void *opaque); +#endif diff --git a/server/filters.c b/server/filters.c index 5b7abc4..e284080 100644 --- a/server/filters.c +++ b/server/filters.c @@ -309,6 +309,13 @@ next_can_zero (void *nxdata) return b_conn->b->can_zero (b_conn->b, b_conn->conn); } +static int +next_can_extents (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_extents (b_conn->b, b_conn->conn); +} + static int next_can_fua (void *nxdata) { @@ -364,6 +371,16 @@ next_zero (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, return b_conn->b->zero (b_conn->b, b_conn->conn, count, offset, flags, err); } +static int +next_extents (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->extents (b_conn->b, b_conn->conn, count, offset, flags, + extents_map, err); +} + static struct nbdkit_next_ops next_ops = { .get_size = next_get_size, .can_write = next_can_write, @@ -371,6 +388,7 @@ static struct nbdkit_next_ops next_ops = { .is_rotational = next_is_rotational, .can_trim = next_can_trim, .can_zero = next_can_zero, + .can_extents = next_can_extents, .can_fua = next_can_fua, .can_multi_conn = next_can_multi_conn, .pread = next_pread, @@ -378,6 +396,7 @@ static struct nbdkit_next_ops next_ops = { .flush = next_flush, .trim = next_trim, .zero = next_zero, + .extents = next_extents, }; static int @@ -511,6 +530,21 @@ filter_can_zero (struct backend *b, struct connection *conn) return f->backend.next->can_zero (f->backend.next, conn); } +static int +filter_can_extents (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + debug ("%s: can_extents", f->name); + + if (f->filter.can_extents) + return f->filter.can_extents (&next_ops, &nxdata, handle); + else + return f->backend.next->can_extents (f->backend.next, conn); +} + static int filter_can_fua (struct backend *b, struct connection *conn) { @@ -646,6 +680,31 @@ filter_zero (struct backend *b, struct connection *conn, count, offset, flags, err); } +static int +filter_extents (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + + debug ("%s: extents count=%" PRIu32 " offset=%" PRIu64 " flags=0x%" PRIx32, + f->name, count, offset, flags); + + if (f->filter.extents) + return f->filter.extents (&next_ops, &nxdata, handle, + count, offset, flags, + extents_map, err); + else + return f->backend.next->extents (f->backend.next, conn, + count, offset, flags, + extents_map, err); +} + static struct backend filter_functions = { .free = filter_free, .thread_model = filter_thread_model, @@ -667,6 +726,7 @@ static struct backend filter_functions = { .is_rotational = filter_is_rotational, .can_trim = filter_can_trim, .can_zero = filter_can_zero, + .can_extents = filter_can_extents, .can_fua = filter_can_fua, .can_multi_conn = filter_can_multi_conn, .pread = filter_pread, @@ -674,6 +734,7 @@ static struct backend filter_functions = { .flush = filter_flush, .trim = filter_trim, .zero = filter_zero, + .extents = filter_extents, }; /* Register and load a filter. */ diff --git a/server/plugins.c b/server/plugins.c index 28b96ad..5d223e1 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -199,6 +199,8 @@ plugin_dump_fields (struct backend *b) HAS (trim); HAS (zero); HAS (can_multi_conn); + HAS (can_extents); + HAS (extents); #undef HAS /* Custom fields. */ @@ -392,6 +394,21 @@ plugin_can_zero (struct backend *b, struct connection *conn) return plugin_can_write (b, conn); } +static int +plugin_can_extents (struct backend *b, struct connection *conn) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + assert (connection_get_handle (conn, 0)); + + debug ("can_extents"); + + if (p->plugin.can_extents) + return p->plugin.can_extents (connection_get_handle (conn, 0)); + else + return p->plugin.extents != NULL; +} + static int plugin_can_fua (struct backend *b, struct connection *conn) { @@ -651,6 +668,48 @@ plugin_zero (struct backend *b, struct connection *conn, return r; } +static int +plugin_extents (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents_map *extents_map, + int *err) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + int r; + int can_extents = 0; /* XXX probably should be cached per connection */ + + assert (connection_get_handle (conn, 0)); + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + + debug ("extents count=%" PRIu32 " offset=%" PRIu64 " req_one=%d", + count, offset, req_one); + + if (!count) + return 0; + + if (p->plugin.can_extents) { + can_extents = p->plugin.can_extents (connection_get_handle (conn, 0)); + if (can_extents == -1) { + *err = get_error (p); + return -1; + } + } + + if (can_extents) { + r = p->plugin.extents (connection_get_handle (conn, 0), count, offset, + flags, extents_map); + if (r == -1) + *err = get_error (p); + return r; + } + else { + /* By default we assume that everything in the range is allocated. */ + return nbdkit_extent_add (extents_map, offset, count, + NBDKIT_EXTENT_TYPE_DATA); + } +} + static struct backend plugin_functions = { .free = plugin_free, .thread_model = plugin_thread_model, @@ -672,6 +731,7 @@ static struct backend plugin_functions = { .is_rotational = plugin_is_rotational, .can_trim = plugin_can_trim, .can_zero = plugin_can_zero, + .can_extents = plugin_can_extents, .can_fua = plugin_can_fua, .can_multi_conn = plugin_can_multi_conn, .pread = plugin_pread, @@ -679,6 +739,7 @@ static struct backend plugin_functions = { .flush = plugin_flush, .trim = plugin_trim, .zero = plugin_zero, + .extents = plugin_extents, }; /* Register and load a plugin. */ diff --git a/server/Makefile.am b/server/Makefile.am index b198afb..1c2fe06 100644 --- a/server/Makefile.am +++ b/server/Makefile.am @@ -43,6 +43,7 @@ nbdkit_SOURCES = \ connections.c \ crypto.c \ debug.c \ + extents.c \ filters.c \ internal.h \ locks.c \ -- 2.20.1
Eric Blake
2019-Mar-12 18:58 UTC
Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
On 3/12/19 12:11 PM, Richard W.M. Jones wrote:> This pair of calls allows plugins to describe which extents in the > virtual disk are allocated, holes or zeroes. > ---> +++ b/docs/nbdkit-filter.pod> +The C<extents_map> parameter passed to this function is empty.True only if any earlier filter also passed in an empty map. Maybe it's worth explicitly mentioning that the filter must in turn pass an empty map if it calls next_ops?> If the > +filter does not need to adjust extents from the underlying plugin it > +can simply pass this through to the layer below: > + > + myfilter_extents (...) > + { > + return next_ops->extents (nxdata, count, offset, flags, extents_map, err); > + } > + > +It is also possible for filters to transform the extents map received > +back from the layer below. Usually this must be done by allocating a > +new map which is passed to the layer below. Without error handling it > +would look like this: > + > + myfilter_extents (...) > + { > + struct nbdkit_extents_map *map2 = nbdkit_extents_new (); > + next_ops->extents (nxdata, count, offset, flags, map2, err); > + /* transform map2 and return results in extents_map */ > + nbdkit_extents_foreach (map2, transform_offset, extents_map); > + nbdkit_extents_free (map2); > + }And the fact that we can easily call nbdkit_extents_new() as needed means that a filter could, for example, call next_ops->extents() even for its pread implementation (of course, only for a plugin that .can_extents), if the filter knowing which parts of the plugin are sparse can make the filter run more efficiently on calls other than NBD_CMD_BLOCK_STATUS.> +=head2 C<.extents> > + > + int extents (void *handle, uint32_t count, uint64_t offset, > + uint32_t flags, > + struct nbdkit_extents_map *extents_map); > + > +During the data serving phase, this callback is used to > +detect allocated (non-sparse) regions of the disk. > + > +This function will not be called if C<.can_extents> returned false.Is it worth mentioning that nbdkit's default behavior is to treat the entire image as allocated non-zero if .can_extents returns false?> + > +The callback should detect and return the list of extents overlapping > +the range C<[offset...offset+count-1]>. The C<extents_map> parameter > +points to an opaque object which the callback should fill in by > +calling C<nbdkit_extent_add> etc. See L</Extents map> below. > + > +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> > +which means that the client is only requesting information about the > +extent overlapping C<offset>. The plugin may ignore this flag, or as > +an optimization it may return just a single extent for C<offset>. > + > +If there is an error, C<.extents> should call C<nbdkit_error> with an > +error message, and C<nbdkit_set_error> to record an appropriate error > +(unless C<errno> is sufficient), then return C<-1>. > + > +=head3 Extents map > + > +The plugin C<extents> callback is passed an opaque pointer C<struct > +nbdkit_extents_map *extents_map>. This structure represents a list of > +L<filesystem extents|https://en.wikipedia.org/wiki/Extent_(file_systems)> > +describing which areas of the disk are allocated, which are sparse > +(“holes”), and, if supported, which are zeroes. Initially the map > +describes a single hole covering the entire virtual disk.Is an initial hole the wisest default, or should the initial state be data, and the plugin can do an nbdkit_extent_add(0, size, HOLE) if the client finds it easier to start with all holes? After all, the NBD protocol states that returning data/allocated is always the safest default (returning hole and/or read-zeroes should only be done when we are sure about it, but returning data/allocated is fine even if it causes the client to be less efficient for reading holes that it could have otherwise skipped).> + > +The C<extents> callback has to fill in any allocated or zeroes areas > +by calling C<nbdkit_extent*> functions, to describe all extents > +overlapping the range C<[offset...offset+count-1]>. (nbdkit will > +ignore extents outside this range so plugins may, if it is easier to > +implement, return all extent information for the whole disk.)Is it worth mentioning that speed is slightly more important than accuracy (that is, it's better to quickly return DATA than to slowly read the are in question and memcmp() to see if it is all zeros in order to return a more precise ZERO; ideally, returning anything other than DATA should only be done when the time collecting that information is proportionate to the number of requests rather to the size of the region being queried). Is nbdkit allowed to cache the information learned from the plugin for regions outside the client's initial query for later use, or are we better off always asking for information again (even if the answer we get will be the same)?> + > +The following functions are available for plugins or filters to call: > + > + int nbdkit_extent_add (struct nbdkit_extents_map *extents_map, > + uint64_t offset, uint64_t length, uint32_t type); > + > +Add an extent covering C<[offset...offset+length-1]> of one of > +the following types: > + > +=over 4 > + > +=item NBDKIT_EXTENT_TYPE_HOLE > + > +An unallocated extent, a.k.a. a “hole”. > + > +=item NBDKIT_EXTENT_TYPE_DATA > + > +A normal extent containing data. > + > +=item NBDKIT_EXTENT_TYPE_ZERO > + > +An extent which is allocated and contains all zero bytes.The protocol allows all four combinations of the two bits (you could have a non-allocated area not known to read as zeroes, for example, some SCSI disks have this property); should we likewise have four potential different types, rather than just three?> + > +=back > + > +If the new extent overlaps some previous extent in the map then the > +overlapping part(s) are overwritten with the new type. Adjacent > +extents of the same type may also be coalesced. Thus you shouldn't > +assume that C<nbdkit_extent_add> always adds exactly one more extent > +to the map.That's a nice aspect - it means a client CAN call add(0, full_size, HOLE) and then refine it with subsequent add(offset, length, DATA). It also seems like a fairly straightforward implementation - keep a sorted list of extents, and then split/coalescse into the right sorted location as more add() requests come in. Then it is an implementation detail whether we track the list by array, linked list, or binary tree, which in turn may be determined by how many extents we expect to be dealing with (choosing an implementation with O(log n) lookup/insertion rather than O(n) in case the client gives us LOTS of add() calls may suggest that a tree is best, but prototyping with an array for the initial cut is less code complexity).> + > +C<nbdkit_extent_add> returns C<0> on success or C<-1> on failure. On > +failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been > +called. > + > + void nbdkit_extents_clear (struct nbdkit_extents_map *extents_map); > + > +Clear the extents map. After this the map will have a single hole > +covering the entire disk. > + > + int nbdkit_extents_foreach (const struct nbdkit_extents_map *extents_map, > + int (*fn) (uint64_t offset, uint64_t length, > + uint32_t type, void *opaque), > + void *opaque); > + > +Iterate over the extents in order, calling C<fn> for each. C<fn> > +should return 0 on success or -1 on failure. If a call to C<fn> fails > +then C<nbdkit_extents_foreach> also fails immediately.Question - should foreach allow the caller to pass in bounds? That is, since the plugin can populate 0 - full_size, but where we know the client only asked for information on offset - length, it may be more efficient to visit just the extents lying within those bounds. (Maybe allow -1 as a shortcut rather than an actual end size, when the ending offset is less important). Is it worth an explicit guarantee that foreach() visits extents in ascending order, regardless of whether add() was called out of order? (Ties us to our implementation where we split/coalesce per add - but seems like it can make life easier for filters if we do have that guarantee).> + > +C<nbdkit_extents_foreach> returns C<0> on success or C<-1> on failure. > +However it does not call C<nbdkit_error> or C<nbdkit_set_error> - it > +is expected that C<fn> will call those functions on failure.either the failing C<fn>, or the caller of foreach(). No documentation of nbdkit_extents_{new,free}?> > +#define NBDKIT_EXTENT_TYPE_HOLE 0 > +#define NBDKIT_EXTENT_TYPE_DATA 1 > +#define NBDKIT_EXTENT_TYPE_ZERO 2 > +Repeating the question above about whether to expose all four bit combinations possible in the NBD protocol.> +++ b/server/extents.c > @@ -0,0 +1,67 @@> +struct nbdkit_extents_map { > + // TBD XXX > +};Okay, so we're still in the design phase :)> +static int > +plugin_extents (struct backend *b, struct connection *conn, > + uint32_t count, uint64_t offset, uint32_t flags, > + struct nbdkit_extents_map *extents_map, > + int *err) > +{ > + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); > + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; > + int r; > + int can_extents = 0; /* XXX probably should be cached per connection */There's probably a lot of the .can_FOO calls that should be cached per connection. Still, it's looking fairly reasonable. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
- Re: [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.