Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 00/19] Implement extents.
This has already been pushed upstream. I am simply posting these here so we have a reference in the mailing list in case we find bugs later (as I'm sure we will - it's a complex patch series). Great thanks to Eric Blake for tireless review on this one. It also seems to have identified a few minor bugs in qemu along the way. Rich.
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 01/19] server: Implement extents/can_extents calls for plugins and filters.
This new pair of callbacks allows plugins to describe which extents in the virtual disk are allocated, holes or zeroes. --- docs/nbdkit-filter.pod | 83 ++++++++++++++++ docs/nbdkit-plugin.pod | 97 +++++++++++++++++++ include/nbdkit-common.h | 10 +- include/nbdkit-filter.h | 22 ++++- include/nbdkit-plugin.h | 6 +- server/internal.h | 4 + server/extents.c | 210 ++++++++++++++++++++++++++++++++++++++++ server/filters.c | 59 +++++++++++ server/plugins.c | 54 ++++++++++- server/Makefile.am | 1 + server/nbdkit.syms | 5 + 11 files changed, 547 insertions(+), 4 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index dc9a262..9e51c68 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -350,6 +350,8 @@ calls. =head2 C<.can_zero> +=head2 C<.can_extents> + =head2 C<.can_fua> =head2 C<.can_multi_conn> @@ -365,6 +367,8 @@ calls. void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -513,6 +517,85 @@ value to return to the client. The filter should never fail with C<EOPNOTSUPP> (while plugins have automatic fallback to C<.pwrite>, filters do not). +=head2 C<.extents> + + int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, + int *err); + +This intercepts the plugin C<.extents> method and can be used to +modify extent requests. + +This function will not be called if C<.can_extents> returned false; in +turn, the filter should not call C<next_ops-E<gt>extents> if +C<next_ops-E<gt>can_extents> did not return true. + +It is possible for filters to transform the extents list received back +from the layer below. Without error checking it would look like this: + + myfilter_extents (..., uint32_t count, uint64_t offset, ...) + { + size_t i; + struct nbdkit_extents *extents2; + struct nbdkit_extent e; + int64_t size; + + size = next_ops->get_size (nxdata); + extents2 = nbdkit_extents_new (offset + shift, size - shift); + next_ops->extents (nxdata, count, offset + shift, flags, extents2, err); + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + e.offset -= shift; + nbdkit_add_extent (extents, e.offset, e.length, e.type); + } + nbdkit_extents_free (extents2); + } + +If there is an error, C<.extents> should call C<nbdkit_error> with an +error message B<and> return -1 with C<err> set to the positive errno +value to return to the client. + +=head3 Allocating and freeing nbdkit_extents list + +Two functions are provided to filters only for allocating and freeing +the map: + + struct nbdkit_extents *nbdkit_extents_new (uint64_t start, uint64_t end); + +Allocates and returns a new, empty extents list. The C<start> +parameter is the start of the range described in the list, and the +C<end> parameter is the offset of the byte beyond the end. Normally +you would pass in C<offset> as the start and the size of the plugin as +the end, but for filters which adjust offsets, they should pass in the +adjusted offset. + +On error this function can return C<NULL>. In this case it calls +C<nbdkit_error> and/or C<nbdkit_set_error> as required. + + void nbdkit_extents_free (struct nbdkit_extents *); + +Frees an existing extents list. + +=head3 Iterating over nbdkit_extents list + +Two functions are provided to filters only to iterate over the extents +in order: + + size_t nbdkit_extents_count (const struct nbdkit_extents *); + +Returns the number of extents in the list. + + struct nbdkit_extent { + uint64_t offset; + uint64_t length; + uint32_t type; + }; + struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *, + size_t i); + +Returns a copy of the C<i>'th extent. + =head1 ERROR HANDLING If there is an error in the filter itself, the filter should call diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index 47545f3..272ec67 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -555,6 +555,20 @@ This callback is not required. If omitted, then nbdkit always tries C<.zero> first if it is present, and gracefully falls back to C<.pwrite> if C<.zero> was absent or failed with C<EOPNOTSUPP>. +=head2 C<.can_extents> + + int can_extents (void *handle); + +This is called during the option negotiation phase to find out if the +plugin supports detecting allocated (non-sparse) regions of the disk +with the C<.extents> callback. + +If there is an error, C<.can_extents> should call C<nbdkit_error> with +an error message and return C<-1>. + +This callback is not required. If omitted, then we return true iff a +C<.extents> callback has been defined. + =head2 C<.can_fua> int can_fua (void *handle); @@ -717,6 +731,89 @@ If there is an error, C<.zero> should call C<nbdkit_error> with an error message, and C<nbdkit_set_error> to record an appropriate error (unless C<errno> is sufficient), then return C<-1>. +=head2 C<.extents> + + int extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents); + +During the data serving phase, this callback is used to detect +allocated, sparse and zeroed regions of the disk. + +This function will not be called if C<.can_extents> returned false. +nbdkit's default behaviour in this case is to treat the whole virtual +disk as if it was allocated. + +The callback should detect and return the list of extents overlapping +the range C<[offset...offset+count-1]>. The C<extents> parameter +points to an opaque object which the callback should fill in by +calling C<nbdkit_add_extent>. See L</Extents list> below. + +If there is an error, C<.extents> should call C<nbdkit_error> with an +error message, and C<nbdkit_set_error> to record an appropriate error +(unless C<errno> is sufficient), then return C<-1>. + +=head3 Extents list + +The plugin C<extents> callback is passed an opaque pointer C<struct +nbdkit_extents *extents>. This structure represents a list of +L<filesystem extents|https://en.wikipedia.org/wiki/Extent_(file_systems)> +describing which areas of the disk are allocated, which are sparse +(“holes”), and, if supported, which are zeroes. + +The C<extents> callback should scan the disk starting at C<offset> and +call C<nbdkit_add_extent> for each extent found. + +Extents overlapping the range C<[offset...offset+count-1]> should be +returned if possible. However nbdkit ignores extents E<lt> offset so +the plugin may, if it is easier to implement, return all extent +information for the whole disk. The plugin may return extents beyond +the end of the range. It may also return extent information for less +than the whole range, but it must return at least one extent +overlapping C<offset>. + +The extents B<must> be added in ascending order, and B<must> be +contiguous. + +The C<flags> parameter of the C<.extents> callback may contain the +flag C<NBDKIT_FLAG_REQ_ONE>. This means that the client is only +requesting information about the extent overlapping C<offset>. The +plugin may ignore this flag, or as an optimization it may return just +a single extent for C<offset>. + + int nbdkit_add_extent (struct nbdkit_extents *extents, + uint64_t offset, uint64_t length, uint32_t type); + +Add an extent covering C<[offset...offset+length-1]> of one of +the following four types: + +=over 4 + +=item C<type = 0> + +A normal, allocated data extent. + +=item C<type = NBDKIT_EXTENT_HOLE|NBDKIT_EXTENT_ZERO> + +An unallocated extent, a.k.a. a “hole”, which reads back as zeroes. +This is the normal type of hole applicable to most disks. + +=item C<type = NBDKIT_EXTENT_ZERO> + +An allocated extent which is known to contain only zeroes. + +=item C<type = NBDKIT_EXTENT_HOLE> + +An unallocated extent (hole) which does not read back as zeroes. Note +this should only be used in specialized circumstances such as when +writing a plugin for (or to emulate) certain SCSI drives which do not +guarantee that trimmed blocks read back as zeroes. + +=back + +C<nbdkit_extent_add> returns C<0> on success or C<-1> on failure. On +failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been +called. + =head1 THREADS Each nbdkit plugin must declare its thread safety model by defining diff --git a/include/nbdkit-common.h b/include/nbdkit-common.h index cb9954e..7c69b14 100644 --- a/include/nbdkit-common.h +++ b/include/nbdkit-common.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -60,11 +60,15 @@ extern "C" { #define NBDKIT_FLAG_MAY_TRIM (1<<0) /* Maps to !NBD_CMD_FLAG_NO_HOLE */ #define NBDKIT_FLAG_FUA (1<<1) /* Maps to NBD_CMD_FLAG_FUA */ +#define NBDKIT_FLAG_REQ_ONE (1<<2) /* Maps to NBD_CMD_FLAG_REQ_ONE */ #define NBDKIT_FUA_NONE 0 #define NBDKIT_FUA_EMULATE 1 #define NBDKIT_FUA_NATIVE 2 +#define NBDKIT_EXTENT_HOLE (1<<0) /* Same as NBD_STATE_HOLE */ +#define NBDKIT_EXTENT_ZERO (1<<1) /* Same as NBD_STATE_ZERO */ + extern void nbdkit_error (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2); extern void nbdkit_verror (const char *msg, va_list args); extern void nbdkit_debug (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2); @@ -76,6 +80,10 @@ extern int nbdkit_parse_bool (const char *str); extern int nbdkit_read_password (const char *value, char **password); extern char *nbdkit_realpath (const char *path); +struct nbdkit_extents; +extern int nbdkit_add_extent (struct nbdkit_extents *, + uint64_t offset, uint64_t length, uint32_t type); + /* A static non-NULL pointer which can be used when you don't need a * per-connection handle. */ diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 71c06c8..026145b 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -44,6 +44,18 @@ extern "C" { #define NBDKIT_FILTER_API_VERSION 2 +struct nbdkit_extent { + uint64_t offset; + uint64_t length; + uint32_t type; +}; + +extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start, uint64_t end); +extern void nbdkit_extents_free (struct nbdkit_extents *); +extern size_t nbdkit_extents_count (const struct nbdkit_extents *); +extern struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *, + size_t); + typedef int nbdkit_next_config (void *nxdata, const char *key, const char *value); typedef int nbdkit_next_config_complete (void *nxdata); @@ -58,6 +70,7 @@ struct nbdkit_next_ops { int (*is_rotational) (void *nxdata); int (*can_trim) (void *nxdata); int (*can_zero) (void *nxdata); + int (*can_extents) (void *nxdata); int (*can_fua) (void *nxdata); int (*can_multi_conn) (void *nxdata); @@ -71,6 +84,8 @@ struct nbdkit_next_ops { int *err); int (*zero) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err); }; struct nbdkit_filter { @@ -120,6 +135,8 @@ struct nbdkit_filter { void *handle); int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -140,6 +157,9 @@ struct nbdkit_filter { int (*zero) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err); }; #define NBDKIT_REGISTER_FILTER(filter) \ diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index d43b2f5..20d193c 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -125,6 +125,10 @@ struct nbdkit_plugin { const char *magic_config_key; int (*can_multi_conn) (void *handle); + + int (*can_extents) (void *handle); + int (*extents) (void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents); }; extern void nbdkit_set_error (int err); diff --git a/server/internal.h b/server/internal.h index d40a82d..71d2815 100644 --- a/server/internal.h +++ b/server/internal.h @@ -274,6 +274,7 @@ struct backend { int (*is_rotational) (struct backend *, struct connection *conn); int (*can_trim) (struct backend *, struct connection *conn); int (*can_zero) (struct backend *, struct connection *conn); + int (*can_extents) (struct backend *, struct connection *conn); int (*can_fua) (struct backend *, struct connection *conn); int (*can_multi_conn) (struct backend *, struct connection *conn); @@ -287,6 +288,9 @@ struct backend { uint64_t offset, uint32_t flags, int *err); int (*zero) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags, int *err); + int (*extents) (struct backend *, struct connection *conn, uint32_t count, + uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err); }; /* plugins.c */ diff --git a/server/extents.c b/server/extents.c new file mode 100644 index 0000000..549d39f --- /dev/null +++ b/server/extents.c @@ -0,0 +1,210 @@ +/* nbdkit + * Copyright (C) 2019 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdbool.h> +#include <string.h> +#include <inttypes.h> +#include <errno.h> +#include <assert.h> + +#include "minmax.h" + +#include "internal.h" + +struct nbdkit_extents { + struct nbdkit_extent *extents; + size_t nr_extents, allocated; + uint64_t start, end; /* end is one byte beyond the end of the range */ +}; + +struct nbdkit_extents * +nbdkit_extents_new (uint64_t start, uint64_t end) +{ + struct nbdkit_extents *r; + + if (start >= INT64_MAX || end >= INT64_MAX) { + nbdkit_error ("nbdkit_extents_new: " + "start (%" PRIu64 ") or end (%" PRIu64 ") >= INT64_MAX", + start, end); + errno = ERANGE; + return NULL; + } + + /* 0-length ranges are possible, so start == end is not an error. */ + if (start > end) { + nbdkit_error ("nbdkit_extents_new: " + "start (%" PRIu64 ") >= end (%" PRIu64 ")", + start, end); + errno = ERANGE; + return NULL; + } + + r = malloc (sizeof *r); + if (r == NULL) { + nbdkit_error ("nbdkit_extents_new: malloc: %m"); + return NULL; + } + r->extents = NULL; + r->nr_extents = r->allocated = 0; + r->start = start; + r->end = end; + return r; +} + +void +nbdkit_extents_free (struct nbdkit_extents *exts) +{ + if (exts) { + free (exts->extents); + free (exts); + } +} + +size_t +nbdkit_extents_count (const struct nbdkit_extents *exts) +{ + return exts->nr_extents; +} + +const struct nbdkit_extent +nbdkit_get_extent (const struct nbdkit_extents *exts, size_t i) +{ + assert (i < exts->nr_extents); + return exts->extents[i]; +} + +/* Insert *e in the list at the end. */ +static int +append_extent (struct nbdkit_extents *exts, const struct nbdkit_extent *e) +{ + if (exts->nr_extents >= exts->allocated) { + size_t new_allocated; + struct nbdkit_extent *new_extents; + + new_allocated = exts->allocated; + if (new_allocated == 0) + new_allocated = 1; + new_allocated *= 2; + new_extents + realloc (exts->extents, new_allocated * sizeof (struct nbdkit_extent)); + if (new_extents == NULL) { + nbdkit_error ("nbdkit_add_extent: realloc: %m"); + return -1; + } + exts->allocated = new_allocated; + exts->extents = new_extents; + } + + exts->extents[exts->nr_extents] = *e; + exts->nr_extents++; + return 0; +} + +int +nbdkit_add_extent (struct nbdkit_extents *exts, + uint64_t offset, uint64_t length, uint32_t type) +{ + uint64_t overlap; + + if (length == 0) + return 0; + + /* Ignore extents beyond the end of the range. Unfortunately we + * lost the information about whether this is contiguous with a + * previously added extent so we can't check for API usage errors. + */ + if (offset >= exts->end) + return 0; + + /* Shorten extents that overlap the end of the range. */ + if (offset + length >= exts->end) { + overlap = offset + length - exts->end; + length -= overlap; + } + + /* If there are existing extents, the new extent must be contiguous. */ + if (exts->nr_extents > 0) { + const struct nbdkit_extent *ep; + + ep = &exts->extents[exts->nr_extents-1]; + if (offset != ep->offset + ep->length) { + nbdkit_error ("nbdkit_add_extent: " + "extents must be added in ascending order and " + "must be contiguous"); + return -1; + } + } + else { + /* If there are no existing extents, and the new extent is + * entirely before start, ignore it. + */ + if (offset + length <= exts->start) + return 0; + + /* If there are no existing extents, and the new extent is after + * start, then this is a bug in the plugin. + */ + if (offset > exts->start) { + nbdkit_error ("nbdkit_add_extent: " + "first extent must not be > start (%" PRIu64 ")", + exts->start); + return -1; + } + + /* If there are no existing extents, and the new extent overlaps + * start, truncate it so it starts at start. + */ + overlap = exts->start - offset; + length -= overlap; + offset += overlap; + } + + /* If we get here we are going to either add or extend. */ + if (exts->nr_extents > 0 && + exts->extents[exts->nr_extents-1].type == type) { + /* Coalesce with the last extent. */ + exts->extents[exts->nr_extents-1].length += length; + return 0; + } + else { + /* Add a new extent. */ + const struct nbdkit_extent e + { .offset = offset, .length = length, .type = type }; + return append_extent (exts, &e); + } +} diff --git a/server/filters.c b/server/filters.c index 5b7abc4..5095188 100644 --- a/server/filters.c +++ b/server/filters.c @@ -309,6 +309,13 @@ next_can_zero (void *nxdata) return b_conn->b->can_zero (b_conn->b, b_conn->conn); } +static int +next_can_extents (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_extents (b_conn->b, b_conn->conn); +} + static int next_can_fua (void *nxdata) { @@ -364,6 +371,15 @@ next_zero (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, return b_conn->b->zero (b_conn->b, b_conn->conn, count, offset, flags, err); } +static int +next_extents (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->extents (b_conn->b, b_conn->conn, count, offset, flags, + extents, err); +} + static struct nbdkit_next_ops next_ops = { .get_size = next_get_size, .can_write = next_can_write, @@ -371,6 +387,7 @@ static struct nbdkit_next_ops next_ops = { .is_rotational = next_is_rotational, .can_trim = next_can_trim, .can_zero = next_can_zero, + .can_extents = next_can_extents, .can_fua = next_can_fua, .can_multi_conn = next_can_multi_conn, .pread = next_pread, @@ -378,6 +395,7 @@ static struct nbdkit_next_ops next_ops = { .flush = next_flush, .trim = next_trim, .zero = next_zero, + .extents = next_extents, }; static int @@ -511,6 +529,21 @@ filter_can_zero (struct backend *b, struct connection *conn) return f->backend.next->can_zero (f->backend.next, conn); } +static int +filter_can_extents (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + debug ("%s: can_extents", f->name); + + if (f->filter.can_extents) + return f->filter.can_extents (&next_ops, &nxdata, handle); + else + return f->backend.next->can_extents (f->backend.next, conn); +} + static int filter_can_fua (struct backend *b, struct connection *conn) { @@ -646,6 +679,30 @@ filter_zero (struct backend *b, struct connection *conn, count, offset, flags, err); } +static int +filter_extents (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + + debug ("%s: extents count=%" PRIu32 " offset=%" PRIu64 " flags=0x%" PRIx32, + f->name, count, offset, flags); + + if (f->filter.extents) + return f->filter.extents (&next_ops, &nxdata, handle, + count, offset, flags, + extents, err); + else + return f->backend.next->extents (f->backend.next, conn, + count, offset, flags, + extents, err); +} + static struct backend filter_functions = { .free = filter_free, .thread_model = filter_thread_model, @@ -667,6 +724,7 @@ static struct backend filter_functions = { .is_rotational = filter_is_rotational, .can_trim = filter_can_trim, .can_zero = filter_can_zero, + .can_extents = filter_can_extents, .can_fua = filter_can_fua, .can_multi_conn = filter_can_multi_conn, .pread = filter_pread, @@ -674,6 +732,7 @@ static struct backend filter_functions = { .flush = filter_flush, .trim = filter_trim, .zero = filter_zero, + .extents = filter_extents, }; /* Register and load a filter. */ diff --git a/server/plugins.c b/server/plugins.c index 28b96ad..0b0fe77 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -199,6 +199,8 @@ plugin_dump_fields (struct backend *b) HAS (trim); HAS (zero); HAS (can_multi_conn); + HAS (can_extents); + HAS (extents); #undef HAS /* Custom fields. */ @@ -392,6 +394,21 @@ plugin_can_zero (struct backend *b, struct connection *conn) return plugin_can_write (b, conn); } +static int +plugin_can_extents (struct backend *b, struct connection *conn) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + assert (connection_get_handle (conn, 0)); + + debug ("can_extents"); + + if (p->plugin.can_extents) + return p->plugin.can_extents (connection_get_handle (conn, 0)); + else + return p->plugin.extents != NULL; +} + static int plugin_can_fua (struct backend *b, struct connection *conn) { @@ -651,6 +668,39 @@ plugin_zero (struct backend *b, struct connection *conn, return r; } +static int +plugin_extents (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + int r; + + assert (connection_get_handle (conn, 0)); + assert (!(flags & ~NBDKIT_FLAG_REQ_ONE)); + + /* This should be true because plugin_can_extents checks it. */ + assert (p->plugin.extents); + + debug ("extents count=%" PRIu32 " offset=%" PRIu64 " req_one=%d", + count, offset, req_one); + + if (!count) + return 0; + + r = p->plugin.extents (connection_get_handle (conn, 0), count, offset, + flags, extents); + if (r >= 0 && nbdkit_extents_count (extents) < 1) { + nbdkit_error ("extents: plugin must return at least one extent"); + nbdkit_set_error (EINVAL); + r = -1; + } + if (r == -1) + *err = get_error (p); + return r; +} + static struct backend plugin_functions = { .free = plugin_free, .thread_model = plugin_thread_model, @@ -672,6 +722,7 @@ static struct backend plugin_functions = { .is_rotational = plugin_is_rotational, .can_trim = plugin_can_trim, .can_zero = plugin_can_zero, + .can_extents = plugin_can_extents, .can_fua = plugin_can_fua, .can_multi_conn = plugin_can_multi_conn, .pread = plugin_pread, @@ -679,6 +730,7 @@ static struct backend plugin_functions = { .flush = plugin_flush, .trim = plugin_trim, .zero = plugin_zero, + .extents = plugin_extents, }; /* Register and load a plugin. */ diff --git a/server/Makefile.am b/server/Makefile.am index be70700..5ee662e 100644 --- a/server/Makefile.am +++ b/server/Makefile.am @@ -43,6 +43,7 @@ nbdkit_SOURCES = \ connections.c \ crypto.c \ debug.c \ + extents.c \ filters.c \ internal.h \ locks.c \ diff --git a/server/nbdkit.syms b/server/nbdkit.syms index 672abd2..240953e 100644 --- a/server/nbdkit.syms +++ b/server/nbdkit.syms @@ -40,8 +40,13 @@ # The functions we want plugins and filters to call. global: nbdkit_absolute_path; + nbdkit_add_extent; nbdkit_debug; nbdkit_error; + nbdkit_extents_count; + nbdkit_extents_free; + nbdkit_extents_new; + nbdkit_get_extent; nbdkit_parse_bool; nbdkit_parse_size; nbdkit_read_password; -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 02/19] server: Add CLEANUP_EXTENTS_FREE macro.
Provides automatic cleanup of ‘struct nbdkit_extents_map *’ on exit from a scope or function. --- server/internal.h | 2 ++ server/cleanup.c | 8 +++++++- server/Makefile.am | 3 ++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/server/internal.h b/server/internal.h index 71d2815..ae51804 100644 --- a/server/internal.h +++ b/server/internal.h @@ -139,6 +139,8 @@ extern void change_user (void); /* cleanup.c */ extern void cleanup_free (void *ptr); #define CLEANUP_FREE __attribute__((cleanup (cleanup_free))) +extern void cleanup_extents_free (void *ptr); +#define CLEANUP_EXTENTS_FREE __attribute__((cleanup (cleanup_extents_free))) extern void cleanup_unlock (pthread_mutex_t **ptr); #define CLEANUP_UNLOCK __attribute__((cleanup (cleanup_unlock))) #define ACQUIRE_LOCK_FOR_CURRENT_SCOPE(mutex) \ diff --git a/server/cleanup.c b/server/cleanup.c index c29ecec..1eec613 100644 --- a/server/cleanup.c +++ b/server/cleanup.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -46,6 +46,12 @@ cleanup_free (void *ptr) free (* (void **) ptr); } +void +cleanup_extents_free (void *ptr) +{ + nbdkit_extents_free (* (void **) ptr); +} + void cleanup_unlock (pthread_mutex_t **ptr) { diff --git a/server/Makefile.am b/server/Makefile.am index 5ee662e..d50ebe4 100644 --- a/server/Makefile.am +++ b/server/Makefile.am @@ -125,7 +125,8 @@ check_PROGRAMS = test-utils test_utils_SOURCES = \ test-utils.c \ utils.c \ - cleanup.c + cleanup.c \ + extents.c test_utils_CPPFLAGS = \ -I$(top_srcdir)/include \ -I$(top_srcdir)/common/include -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 03/19] server: protocol: Implement Block Status "base:allocation".
This commit implements the protocol side of supporting NBD_CMD_BLOCK_STATUS to read "base:allocation" information about disks. The practical result of this is that clients are able to read which parts of a disk are allocated, holes or zeroes. --- docs/nbdkit-protocol.pod | 5 +- common/protocol/protocol.h | 15 ++ server/internal.h | 8 + server/protocol-handshake-newstyle.c | 63 ++++++- server/protocol-handshake.c | 11 ++ server/protocol.c | 240 +++++++++++++++++++++++++-- TODO | 4 +- 7 files changed, 328 insertions(+), 18 deletions(-) diff --git a/docs/nbdkit-protocol.pod b/docs/nbdkit-protocol.pod index 4c65e00..a9a3390 100644 --- a/docs/nbdkit-protocol.pod +++ b/docs/nbdkit-protocol.pod @@ -134,7 +134,10 @@ Supported in nbdkit E<ge> 1.11.8. =item Block Status -I<Not supported>. +Supported in nbdkit E<ge> 1.11.10. + +Only C<base:allocation> (ie. querying which parts of an image are +sparse) is supported. =item Resize Extension diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h index 06b917e..a7de2f0 100644 --- a/common/protocol/protocol.h +++ b/common/protocol/protocol.h @@ -112,6 +112,7 @@ extern const char *name_of_nbd_rep (int); #define NBD_REP_ACK 1 #define NBD_REP_SERVER 2 #define NBD_REP_INFO 3 +#define NBD_REP_META_CONTEXT 4 #define NBD_REP_ERR_UNSUP 0x80000001 #define NBD_REP_ERR_POLICY 0x80000002 #define NBD_REP_ERR_INVALID 0x80000003 @@ -128,6 +129,18 @@ struct fixed_new_option_reply_info_export { uint16_t eflags; /* per-export flags */ } __attribute__((packed)); +/* NBD_REP_META_CONTEXT reply (follows fixed_new_option_reply). */ +struct fixed_new_option_reply_meta_context { + uint32_t context_id; /* metadata context ID */ + /* followed by a string */ +} __attribute__((packed)); + +/* NBD_REPLY_TYPE_BLOCK_STATUS block descriptor. */ +struct block_descriptor { + uint32_t length; /* length of block */ + uint32_t status_flags; /* block type (hole etc) */ +} __attribute__((packed)); + /* New-style handshake server reply when using NBD_OPT_EXPORT_NAME. * Modern clients use NBD_OPT_GO instead of this. */ @@ -199,10 +212,12 @@ extern const char *name_of_nbd_cmd (int); #define NBD_CMD_FLUSH 3 #define NBD_CMD_TRIM 4 #define NBD_CMD_WRITE_ZEROES 6 +#define NBD_CMD_BLOCK_STATUS 7 extern const char *name_of_nbd_cmd_flag (int); #define NBD_CMD_FLAG_FUA (1<<0) #define NBD_CMD_FLAG_NO_HOLE (1<<1) +#define NBD_CMD_FLAG_REQ_ONE (1<<3) /* Error codes (previously errno). * See http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ca4414804114fd0095b317785bc0b51862e62ebb diff --git a/server/internal.h b/server/internal.h index ae51804..d804441 100644 --- a/server/internal.h +++ b/server/internal.h @@ -181,8 +181,10 @@ struct connection { bool can_zero; bool can_fua; bool can_multi_conn; + bool can_extents; bool using_tls; bool structured_replies; + bool meta_context_base_allocation; int sockin, sockout; connection_recv_function recv; @@ -219,6 +221,12 @@ extern int protocol_handshake_newstyle (struct connection *conn) extern int protocol_recv_request_send_reply (struct connection *conn) __attribute__((__nonnull__ (1))); +/* The context ID of base:allocation. As far as I can tell it doesn't + * matter what this is as long as nbdkit always returns the same + * number. + */ +#define base_allocation_id 1 + /* crypto.c */ #define root_tls_certificates_dir sysconfdir "/pki/" PACKAGE_NAME extern void crypto_init (bool tls_set_on_cli); diff --git a/server/protocol-handshake-newstyle.c b/server/protocol-handshake-newstyle.c index db01f7b..6899e6c 100644 --- a/server/protocol-handshake-newstyle.c +++ b/server/protocol-handshake-newstyle.c @@ -133,6 +133,34 @@ send_newstyle_option_reply_info_export (struct connection *conn, return 0; } +static int +send_newstyle_option_reply_meta_context (struct connection *conn, + uint32_t option, uint32_t reply, + uint32_t context_id, + const char *name) +{ + struct fixed_new_option_reply fixed_new_option_reply; + struct fixed_new_option_reply_meta_context context; + const size_t namelen = strlen (name); + + fixed_new_option_reply.magic = htobe64 (NBD_REP_MAGIC); + fixed_new_option_reply.option = htobe32 (option); + fixed_new_option_reply.reply = htobe32 (reply); + fixed_new_option_reply.replylen = htobe32 (sizeof context + namelen); + context.context_id = htobe32 (context_id); + + if (conn->send (conn, + &fixed_new_option_reply, + sizeof fixed_new_option_reply) == -1 || + conn->send (conn, &context, sizeof context) == -1 || + conn->send (conn, name, namelen) == -1) { + nbdkit_error ("write: %m"); + return -1; + } + + return 0; +} + /* Sub-function during negotiate_handshake_newstyle, to uniformly handle * a client hanging up on a message boundary. */ @@ -503,7 +531,15 @@ negotiate_handshake_newstyle_options (struct connection *conn) * for SET: nr_queries == 0 means reset all contexts */ if (nr_queries == 0) { - /* Nothing is supported now. */ + if (option == NBD_OPT_SET_META_CONTEXT) + conn->meta_context_base_allocation = false; + else /* LIST */ { + if (send_newstyle_option_reply_meta_context + (conn, option, NBD_REP_META_CONTEXT, + 0, "base:allocation") == -1) + return -1; + } + if (send_newstyle_option_reply (conn, option, NBD_REP_ACK) == -1) return -1; } @@ -525,7 +561,30 @@ negotiate_handshake_newstyle_options (struct connection *conn) option == NBD_OPT_LIST_META_CONTEXT ? "query" : "set", (int) querylen, &data[opt_index]); - /* Ignore query - nothing is supported. */ + /* For LIST, "base:" returns all supported contexts in the + * base namespace. We only support "base:allocation". + */ + if (option == NBD_OPT_LIST_META_CONTEXT && + querylen == 5 && + strncmp (&data[opt_index], "base:", 5) == 0) { + if (send_newstyle_option_reply_meta_context + (conn, option, NBD_REP_META_CONTEXT, + 0, "base:allocation") == -1) + return -1; + } + /* "base:allocation" requested by name. */ + else if (querylen == 15 && + strncmp (&data[opt_index], "base:allocation", 15) == 0) { + if (send_newstyle_option_reply_meta_context + (conn, option, NBD_REP_META_CONTEXT, + option == NBD_OPT_SET_META_CONTEXT + ? base_allocation_id : 0, + "base:allocation") == -1) + return -1; + if (option == NBD_OPT_SET_META_CONTEXT) + conn->meta_context_base_allocation = true; + } + /* Every other query must be ignored. */ opt_index += querylen; nr_queries--; diff --git a/server/protocol-handshake.c b/server/protocol-handshake.c index 79a5999..9653210 100644 --- a/server/protocol-handshake.c +++ b/server/protocol-handshake.c @@ -110,6 +110,17 @@ protocol_compute_eflags (struct connection *conn, uint16_t *flags) conn->can_multi_conn = true; } + /* The result of this is not returned to callers here (or at any + * time during the handshake). However it makes sense to do it once + * per connection and store the result in the handle anyway. This + * protocol_compute_eflags function is a bit misnamed XXX. + */ + fl = backend->can_extents (backend, conn); + if (fl == -1) + return -1; + if (fl) + conn->can_extents = true; + *flags = eflags; return 0; } diff --git a/server/protocol.c b/server/protocol.c index f117d42..383938f 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -36,6 +36,7 @@ #include <stdio.h> #include <stdlib.h> #include <stdint.h> +#include <stdbool.h> #include <inttypes.h> #include <string.h> #include <unistd.h> @@ -44,6 +45,7 @@ #include "internal.h" #include "byte-swapping.h" +#include "minmax.h" #include "protocol.h" /* Maximum read or write request that we will handle. */ @@ -78,6 +80,7 @@ validate_request (struct connection *conn, case NBD_CMD_WRITE: case NBD_CMD_TRIM: case NBD_CMD_WRITE_ZEROES: + case NBD_CMD_BLOCK_STATUS: if (!valid_range (conn, offset, count)) { /* XXX Allow writes to extend the disk? */ nbdkit_error ("invalid request: %s: offset and count are out of range: " @@ -106,7 +109,8 @@ validate_request (struct connection *conn, } /* Validate flags */ - if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) { + if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE | + NBD_CMD_FLAG_REQ_ONE)) { nbdkit_error ("invalid request: unknown flag (0x%x)", flags); *error = EINVAL; return false; @@ -117,6 +121,12 @@ validate_request (struct connection *conn, *error = EINVAL; return false; } + if ((flags & NBD_CMD_FLAG_REQ_ONE) && + cmd != NBD_CMD_BLOCK_STATUS) { + nbdkit_error ("invalid request: REQ_ONE flag needs BLOCK_STATUS request"); + *error = EINVAL; + return false; + } if (!conn->can_fua && (flags & NBD_CMD_FLAG_FUA)) { nbdkit_error ("invalid request: FUA flag not supported"); *error = EINVAL; @@ -157,14 +167,37 @@ validate_request (struct connection *conn, return false; } + /* Block status allowed? */ + if (cmd == NBD_CMD_BLOCK_STATUS) { + if (!conn->structured_replies) { + nbdkit_error ("invalid request: " + "%s: structured replies was not negotiated", + name_of_nbd_cmd (cmd)); + *error = EINVAL; + return false; + } + if (!conn->meta_context_base_allocation) { + nbdkit_error ("invalid request: " + "%s: base:allocation was not negotiated", + name_of_nbd_cmd (cmd)); + *error = EINVAL; + return false; + } + } + return true; /* Command validates. */ } /* This is called with the request lock held to actually execute the * request (by calling the plugin). Note that the request fields have * been validated already in 'validate_request' so we don't have to - * check them again. 'buf' is either the data to be written or the - * data to be returned, and points to a buffer of size 'count' bytes. + * check them again. + * + * 'buf' is either the data to be written or the data to be returned, + * and points to a buffer of size 'count' bytes. + * + * 'extents' is an empty extents list used for block status requests + * only. * * In all cases, the return value is the system errno value that will * later be converted to the nbd error to send back to the client (0 @@ -173,7 +206,7 @@ validate_request (struct connection *conn, static uint32_t handle_request (struct connection *conn, uint16_t cmd, uint16_t flags, uint64_t offset, uint32_t count, - void *buf) + void *buf, struct nbdkit_extents *extents) { uint32_t f = 0; bool fua = conn->can_fua && (flags & NBD_CMD_FLAG_FUA); @@ -217,6 +250,33 @@ handle_request (struct connection *conn, return err; break; + case NBD_CMD_BLOCK_STATUS: + /* The other backend methods don't check can_*. That is because + * those methods are implicitly suppressed by returning eflags to + * the client. However there is no eflag for extents so we must + * check it here. + */ + if (conn->can_extents) { + if (flags & NBD_CMD_FLAG_REQ_ONE) + f |= NBDKIT_FLAG_REQ_ONE; + if (backend->extents (backend, conn, count, offset, f, + extents, &err) == -1) + return err; + } + else { + int r; + + /* By default it is safe assume that everything in the range is + * allocated. + */ + errno = 0; + r = nbdkit_add_extent (extents, offset, count, 0 /* allocated data */); + if (r == -1) + return errno ? errno : EINVAL; + return 0; + } + break; + default: abort (); } @@ -359,6 +419,143 @@ send_structured_reply_read (struct connection *conn, return 1; /* command processed ok */ } +/* Convert a list of extents into NBD_REPLY_TYPE_BLOCK_STATUS blocks. + * The rules here are very complicated. Read the spec carefully! + */ +static struct block_descriptor * +extents_to_block_descriptors (struct nbdkit_extents *extents, + uint16_t flags, + uint32_t count, uint64_t offset, + size_t *nr_blocks) +{ + const bool req_one = flags & NBD_CMD_FLAG_REQ_ONE; + const size_t nr_extents = nbdkit_extents_count (extents); + size_t i; + struct block_descriptor *blocks; + + /* This is checked in server/plugins.c. */ + assert (nr_extents >= 1); + + /* We may send fewer than nr_extents blocks, but never more. */ + blocks = calloc (req_one ? 1 : nr_extents, sizeof (struct block_descriptor)); + if (blocks == NULL) { + nbdkit_error ("malloc"); + return NULL; + } + + if (req_one) { + const struct nbdkit_extent e = nbdkit_get_extent (extents, 0); + + /* Checked as a side effect of how the extent list is created. */ + assert (e.length > 0); + + *nr_blocks = 1; + + /* Must not exceed count of the original request. */ + blocks[0].length = MIN (e.length, (uint64_t) count); + blocks[0].status_flags = e.type & 3; + } + else { + uint64_t pos = offset; + + for (i = 0; i < nr_extents; ++i) { + const struct nbdkit_extent e = nbdkit_get_extent (extents, i); + uint64_t length; + + if (i == 0) + assert (e.offset == offset); + + /* Must not exceed UINT32_MAX. */ + length = MIN (e.length, UINT32_MAX); + blocks[i].status_flags = e.type & 3; + + pos += length; + if (pos > offset + count) /* this must be the last block */ + break; + + /* If we reach here then we must have consumed this whole + * extent. This is currently true because the server only sends + * 32 bit requests, but if we move to 64 bit requests we will + * need to revisit this code so it can split extents into + * multiple blocks. XXX + */ + assert (e.length <= length); + } + + *nr_blocks = i; + } + +#if 0 + for (i = 0; i < *nr_blocks; ++i) + nbdkit_debug ("block status: sending block %" PRIu32 " type %" PRIu32, + blocks[i].length, blocks[i].status_flags); +#endif + + /* Convert to big endian for the protocol. */ + for (i = 0; i < *nr_blocks; ++i) { + blocks[i].length = htobe32 (blocks[i].length); + blocks[i].status_flags = htobe32 (blocks[i].status_flags); + } + + return blocks; +} + +static int +send_structured_reply_block_status (struct connection *conn, + uint64_t handle, + uint16_t cmd, uint16_t flags, + uint32_t count, uint64_t offset, + struct nbdkit_extents *extents) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock); + struct structured_reply reply; + CLEANUP_FREE struct block_descriptor *blocks = NULL; + size_t nr_blocks; + uint32_t context_id; + size_t i; + int r; + + assert (conn->meta_context_base_allocation); + assert (cmd == NBD_CMD_BLOCK_STATUS); + + blocks = extents_to_block_descriptors (extents, flags, count, offset, + &nr_blocks); + if (blocks == NULL) + return connection_set_status (conn, -1); + + reply.magic = htobe32 (NBD_STRUCTURED_REPLY_MAGIC); + reply.handle = handle; + reply.flags = htobe16 (NBD_REPLY_FLAG_DONE); + reply.type = htobe16 (NBD_REPLY_TYPE_BLOCK_STATUS); + reply.length = htobe32 (sizeof context_id + + nr_blocks * sizeof (struct block_descriptor)); + + r = conn->send (conn, &reply, sizeof reply); + if (r == -1) { + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); + return connection_set_status (conn, -1); + } + + /* Send the base:allocation context ID. */ + context_id = htobe32 (base_allocation_id); + r = conn->send (conn, &context_id, sizeof context_id); + if (r == -1) { + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); + return connection_set_status (conn, -1); + } + + /* Send each block descriptor. */ + for (i = 0; i < nr_blocks; ++i) { + r = conn->send (conn, &blocks[i], sizeof blocks[i]); + if (r == -1) { + nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd)); + return connection_set_status (conn, -1); + } + } + + return 1; /* command processed ok */ +} + static int send_structured_reply_error (struct connection *conn, uint64_t handle, uint16_t cmd, uint32_t error) @@ -402,6 +599,7 @@ protocol_recv_request_send_reply (struct connection *conn) uint32_t magic, count, error = 0; uint64_t offset; CLEANUP_FREE char *buf = NULL; + CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents = NULL; /* Read the request packet. */ { @@ -449,6 +647,7 @@ protocol_recv_request_send_reply (struct connection *conn) if (cmd == NBD_CMD_READ || cmd == NBD_CMD_WRITE) { buf = malloc (count); if (buf == NULL) { + out_of_memory: perror ("malloc"); error = ENOMEM; if (cmd == NBD_CMD_WRITE && @@ -458,6 +657,13 @@ protocol_recv_request_send_reply (struct connection *conn) } } + /* Allocate the extents list for block status only. */ + if (cmd == NBD_CMD_BLOCK_STATUS) { + extents = nbdkit_extents_new (offset, conn->exportsize); + if (extents == NULL) + goto out_of_memory; + } + /* Receive the write data buffer. */ if (cmd == NBD_CMD_WRITE) { r = conn->recv (conn, buf, count); @@ -478,7 +684,7 @@ protocol_recv_request_send_reply (struct connection *conn) } else { lock_request (conn); - error = handle_request (conn, cmd, flags, offset, count, buf); + error = handle_request (conn, cmd, flags, offset, count, buf, extents); assert ((int) error >= 0); unlock_request (conn); } @@ -498,15 +704,23 @@ protocol_recv_request_send_reply (struct connection *conn) } /* Currently we prefer to send simple replies for everything except - * where we have to (ie. NBD_CMD_READ when structured_replies have - * been negotiated). However this prevents us from sending - * human-readable error messages to the client, so we should - * reconsider this in future. + * where we have to (ie. NBD_CMD_READ and NBD_CMD_BLOCK_STATUS when + * structured_replies have been negotiated). However this prevents + * us from sending human-readable error messages to the client, so + * we should reconsider this in future. */ - if (conn->structured_replies && cmd == NBD_CMD_READ) { - if (!error) - return send_structured_reply_read (conn, request.handle, cmd, - buf, count, offset); + if (conn->structured_replies && + (cmd == NBD_CMD_READ || cmd == NBD_CMD_BLOCK_STATUS)) { + if (!error) { + if (cmd == NBD_CMD_READ) + return send_structured_reply_read (conn, request.handle, cmd, + buf, count, offset); + else /* NBD_CMD_BLOCK_STATUS */ + return send_structured_reply_block_status (conn, request.handle, + cmd, flags, + count, offset, + extents); + } else return send_structured_reply_error (conn, request.handle, cmd, error); } diff --git a/TODO b/TODO index 7cbc238..c968c2d 100644 --- a/TODO +++ b/TODO @@ -24,8 +24,8 @@ General ideas for improvements to inform nbdkit when the response is ready: https://www.redhat.com/archives/libguestfs/2018-January/msg00149.html -* More NBD protocol features. In the upstream pipeline: proposals for - block status and online resize. +* More NBD protocol features. The only currently missing feature is + online resize. * Add a callback to let plugins request minimum alignment for the buffer to pread/pwrite; useful for a plugin utilizing O_DIRECT or -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 04/19] blocksize: Implement extents.
--- filters/blocksize/nbdkit-blocksize-filter.pod | 4 ++-- filters/blocksize/blocksize.c | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/filters/blocksize/nbdkit-blocksize-filter.pod b/filters/blocksize/nbdkit-blocksize-filter.pod index b6cf4dd..0abed2f 100644 --- a/filters/blocksize/nbdkit-blocksize-filter.pod +++ b/filters/blocksize/nbdkit-blocksize-filter.pod @@ -56,8 +56,8 @@ of 1024. =item B<maxlen=>SIZE -The maximum length for any single transaction without data (trim and -zero). If omitted, this defaults to 0xffffffff rounded down to +The maximum length for any single transaction without data (trim, zero +or extents). If omitted, this defaults to 0xffffffff rounded down to C<minsize> alignment (that is, the inherent 32-bit limit of the NBD protocol). This need not be a power of two, but must be an integer multiple of C<minblock>, and should be at least as large as diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c index 9e0e713..55b2d59 100644 --- a/filters/blocksize/blocksize.c +++ b/filters/blocksize/blocksize.c @@ -45,6 +45,7 @@ #include <nbdkit-filter.h> #include "minmax.h" +#include "rounding.h" /* XXX See design comment in filters/cow/cow.c. */ #define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS @@ -372,6 +373,23 @@ blocksize_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } +static int +blocksize_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents, int *err) +{ + /* Ask the plugin for blocksize-aligned data. Since the extents + * list start is set to the real offset, everything before the + * offset is ignored automatically. Also we only need to ask for + * maxlen of data, because it's fine to return less than the full + * count as long as we're making progress. + */ + return next_ops->extents (nxdata, + MIN (count, maxlen), + ROUND_DOWN (offset, minblock), + flags, extents, err); +} + static struct nbdkit_filter filter = { .name = "blocksize", .longname = "nbdkit blocksize filter", @@ -386,6 +404,7 @@ static struct nbdkit_filter filter = { .pwrite = blocksize_pwrite, .trim = blocksize_trim, .zero = blocksize_zero, + .extents = blocksize_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 05/19] cache: Add a comment about possible future handling of extents.
This filter should be correct without intercepting the .extents or .can_extents callbacks, so we don't need to implement this now. Thanks: Eric Blake. --- filters/cache/blk.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/filters/cache/blk.c b/filters/cache/blk.c index d708a54..56c6ab9 100644 --- a/filters/cache/blk.c +++ b/filters/cache/blk.c @@ -69,10 +69,17 @@ static int fd = -1; * 10 = <unused> * 11 = block cached and dirty * - * Idea for future enhancement: Use 10 (currently unused) to store - * blocks which are known to be zero and are not allocated in the - * cache layer. You can set this when clients call cache_zero and - * defer calling plugin->zero until flush. + * Future enhancement: + * + * We need to cache information about holes, ie. blocks which read as + * zeroes but are not explicitly stored in the cache. This + * information could be set when clients call cache_zero (and defer + * calling plugin->zero until flush). The information could also + * interact with extents, so when plugin->extents returns information + * that a hole exists we can record this information in the cache and + * not have to query the plugin a second time (especially useful for + * VDDK where querying extents is slow, and for qemu which [in 2019] + * repeatedly requests the same information with REQ_ONE set). */ static struct bitmap bm; -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 06/19] cow: Disable extents information in this filter.
The cow filter doesn't support trimming and we should assume the underlying plugin is fully allocated too. Note that both of these limitations might be lifted with a more advanced filter implementation. --- filters/cow/cow.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/filters/cow/cow.c b/filters/cow/cow.c index b8f08c4..570a65f 100644 --- a/filters/cow/cow.c +++ b/filters/cow/cow.c @@ -114,7 +114,7 @@ cow_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, } /* Whatever the underlying plugin can or can't do, we can write, we - * cannot trim, and we can flush. + * cannot trim or detect extents, and we can flush. */ static int cow_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) @@ -128,6 +128,12 @@ cow_can_trim (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) return 0; } +static int +cow_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + return 0; +} + static int cow_can_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { @@ -316,6 +322,7 @@ static struct nbdkit_filter filter = { .can_write = cow_can_write, .can_flush = cow_can_flush, .can_trim = cow_can_trim, + .can_extents = cow_can_extents, .can_fua = cow_can_fua, .pread = cow_pread, .pwrite = cow_pwrite, -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 07/19] delay: Allow block status (extents) requests to be separately delayed.
--- filters/delay/nbdkit-delay-filter.pod | 8 ++++++++ filters/delay/delay.c | 26 ++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/filters/delay/nbdkit-delay-filter.pod b/filters/delay/nbdkit-delay-filter.pod index c2eb172..2e2ac74 100644 --- a/filters/delay/nbdkit-delay-filter.pod +++ b/filters/delay/nbdkit-delay-filter.pod @@ -11,6 +11,7 @@ nbdkit-delay-filter - nbdkit delay filter nbdkit --filter=delay plugin [plugin-args ...] delay-read=(SECS|NNms) delay-write=(SECS|NNms) delay-zero=(SECS|NNms) delay-trim=(SECS|NNms) + delay-extents=(SECS|NNms) =head1 DESCRIPTION @@ -63,6 +64,13 @@ Delay zero operations by C<SECS> seconds or C<NN> milliseconds. Delay trim/discard operations by C<SECS> seconds or C<NN> milliseconds. +=item B<delay-extents=>SECS + +=item B<delay-extents=>NNB<ms> + +Delay block status (extents) operations by C<SECS> seconds or C<NN> +milliseconds. + =item B<wdelay=>SECS =item B<wdelay=>NNB<ms> diff --git a/filters/delay/delay.c b/filters/delay/delay.c index a00292b..d11958b 100644 --- a/filters/delay/delay.c +++ b/filters/delay/delay.c @@ -47,6 +47,7 @@ static int delay_read_ms = 0; /* read delay (milliseconds) */ static int delay_write_ms = 0; /* write delay (milliseconds) */ static int delay_zero_ms = 0; /* zero delay (milliseconds) */ static int delay_trim_ms = 0; /* trim delay (milliseconds) */ +static int delay_extents_ms = 0;/* extents delay (milliseconds) */ static int parse_delay (const char *key, const char *value) @@ -110,6 +111,12 @@ trim_delay (void) delay (delay_trim_ms); } +static void +extents_delay (void) +{ + delay (delay_extents_ms); +} + /* Called for each key=value passed on the command line. */ static int delay_config (nbdkit_next_config *next, void *nxdata, @@ -154,6 +161,13 @@ delay_config (nbdkit_next_config *next, void *nxdata, return -1; return 0; } + else if (strcmp (key, "delay-extent") == 0 || + strcmp (key, "delay-extents") == 0) { + delay_extents_ms = parse_delay (key, value); + if (delay_extents_ms == -1) + return -1; + return 0; + } else return next (nxdata, key, value); } @@ -164,6 +178,7 @@ delay_config (nbdkit_next_config *next, void *nxdata, "delay-write=<NN>[ms] Write delay in seconds/milliseconds.\n" \ "delay-zero=<NN>[ms] Zero delay in seconds/milliseconds.\n" \ "delay-trim=<NN>[ms] Trim delay in seconds/milliseconds.\n" \ + "delay-extents=<NN>[ms] Extents delay in seconds/milliseconds.\n" \ "wdelay=<NN>[ms] Write, zero and trim delay in secs/msecs." /* Read data. */ @@ -207,6 +222,16 @@ delay_trim (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->trim (nxdata, count, offset, flags, err); } +/* Extents. */ +static int +delay_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + extents_delay (); + return next_ops->extents (nxdata, count, offset, flags, extents, err); +} + static struct nbdkit_filter filter = { .name = "delay", .longname = "nbdkit delay filter", @@ -217,6 +242,7 @@ static struct nbdkit_filter filter = { .pwrite = delay_pwrite, .zero = delay_zero, .trim = delay_trim, + .extents = delay_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 08/19] error: Extend the error filter so it can inject errors into block status extents requests.
--- filters/error/nbdkit-error-filter.pod | 11 ++++++-- filters/error/error.c | 36 ++++++++++++++++++++++++--- 2 files changed, 41 insertions(+), 6 deletions(-) diff --git a/filters/error/nbdkit-error-filter.pod b/filters/error/nbdkit-error-filter.pod index 504e1b0..16ef184 100644 --- a/filters/error/nbdkit-error-filter.pod +++ b/filters/error/nbdkit-error-filter.pod @@ -12,6 +12,7 @@ nbdkit-error-filter - inject errors for testing clients [error-pwrite=...] [error-pwrite-rate=...] [error-pwrite-file=...] [error-trim=...] [error-trim-rate=...] [error-trim-file=...] [error-zero=...] [error-zero-rate=...] [error-zero-file=...] + [error-extents=...] [error-extents-rate=...] [error-extents-file=...] =head1 DESCRIPTION @@ -29,8 +30,9 @@ Inject a low rate of errors randomly into the connection: nbdkit --filter=error file disk.img error-rate=1% -Reading and trimming requests will be successful, but all -writes and zeroing will return "No space left on device": +Reading, trimming and extents (block status) requests will be +successful, but all writes and zeroing will return "No space left on +device": nbdkit --filter=error file disk.img \ error=ENOSPC \ @@ -105,6 +107,11 @@ settings to NBD trim requests. Same as C<error>, C<error-rate> and C<error-file> but only apply the settings to NBD zero requests. +=item B<error-extents>, B<error-extents-rate>, B<error-extents-file>. + +Same as C<error>, C<error-rate> and C<error-file> but only apply the +settings to NBD block status requests to read extents. + =back =head1 NOTES diff --git a/filters/error/error.c b/filters/error/error.c index 598bd1f..405bbe3 100644 --- a/filters/error/error.c +++ b/filters/error/error.c @@ -63,6 +63,7 @@ static struct error_settings pread_settings = ERROR_DEFAULT; static struct error_settings pwrite_settings = ERROR_DEFAULT; static struct error_settings trim_settings = ERROR_DEFAULT; static struct error_settings zero_settings = ERROR_DEFAULT; +static struct error_settings extents_settings = ERROR_DEFAULT; /* Random state. * This must only be accessed when holding the lock (except for load). @@ -83,6 +84,7 @@ error_unload (void) free (pwrite_settings.file); free (trim_settings.file); free (zero_settings.file); + free (extents_settings.file); } static const struct { const char *name; int error; } errors[] = { @@ -162,7 +164,8 @@ error_config (nbdkit_next_config *next, void *nxdata, if (parse_error (key, value, &i) == -1) return -1; pread_settings.error = pwrite_settings.error - trim_settings.error = zero_settings.error = i; + trim_settings.error = zero_settings.error + extents_settings.error = i; return 0; } else if (strcmp (key, "error-pread") == 0) @@ -173,12 +176,15 @@ error_config (nbdkit_next_config *next, void *nxdata, return parse_error (key, value, &trim_settings.error); else if (strcmp (key, "error-zero") == 0) return parse_error (key, value, &zero_settings.error); + else if (strcmp (key, "error-extents") == 0) + return parse_error (key, value, &extents_settings.error); else if (strcmp (key, "error-rate") == 0) { if (parse_error_rate (key, value, &d) == -1) return -1; pread_settings.rate = pwrite_settings.rate - trim_settings.rate = zero_settings.rate = d; + trim_settings.rate = zero_settings.rate + extents_settings.rate = d; return 0; } else if (strcmp (key, "error-pread-rate") == 0) @@ -189,6 +195,8 @@ error_config (nbdkit_next_config *next, void *nxdata, return parse_error_rate (key, value, &trim_settings.rate); else if (strcmp (key, "error-zero-rate") == 0) return parse_error_rate (key, value, &zero_settings.rate); + else if (strcmp (key, "error-extents-rate") == 0) + return parse_error_rate (key, value, &extents_settings.rate); /* NB: We are using nbdkit_absolute_path here because the trigger * file probably doesn't exist yet. @@ -202,6 +210,8 @@ error_config (nbdkit_next_config *next, void *nxdata, trim_settings.file = nbdkit_absolute_path (value); free (zero_settings.file); zero_settings.file = nbdkit_absolute_path (value); + free (extents_settings.file); + extents_settings.file = nbdkit_absolute_path (value); return 0; } else if (strcmp (key, "error-pread-rate") == 0) { @@ -224,6 +234,11 @@ error_config (nbdkit_next_config *next, void *nxdata, zero_settings.file = nbdkit_absolute_path (value); return 0; } + else if (strcmp (key, "error-extents-rate") == 0) { + free (extents_settings.file); + extents_settings.file = nbdkit_absolute_path (value); + return 0; + } else return next (nxdata, key, value); @@ -234,8 +249,8 @@ error_config (nbdkit_next_config *next, void *nxdata, " The error indication to return.\n" \ "error-rate=0%..100%|0..1 Rate of errors to generate.\n" \ "error-file=TRIGGER Set trigger filename.\n" \ - "error-pread*, error-pwrite*, error-trim*, error-zero*\n" \ - " Apply settings only to read/write/trim/zero" + "error-pread*, error-pwrite*, error-trim*, error-zero*, error-extents*\n" \ + " Apply settings only to read/write/etc" static void * error_open (nbdkit_next_open *next, void *nxdata, int readonly) @@ -330,6 +345,18 @@ error_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->zero (nxdata, count, offset, flags, err); } +/* Extents. */ +static int +error_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents, int *err) +{ + if (random_error (&extents_settings, "extents", err)) + return -1; + + return next_ops->extents (nxdata, count, offset, flags, extents, err); +} + static struct nbdkit_filter filter = { .name = "error", .longname = "nbdkit error filter", @@ -343,6 +370,7 @@ static struct nbdkit_filter filter = { .pwrite = error_pwrite, .trim = error_trim, .zero = error_zero, + .extents = error_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 09/19] log: Log extents requests.
Typical output: 2019-03-26 18:22:17.287434 connection=2 Extents id=2 offset=0x0 count=0x40000000 req_one=1 ... 2019-03-26 18:22:17.290040 connection=2 ...Extents id=2 extents=[{ offset=0x0, length=0x250000, type=0 }, { offset=0x250000, length=0x7db0000, type=3 }, { offset=0x8000000, length=0x8000, type=0 }, { offset=0x8008000, length=0x38000, type=3 }, { offset=0x8040000, length=0x208000, type=0 }, { offset=0x8248000, length=0x7db8000, type=3 }, { offset=0x10000000, length=0x208000, type=0 }, { offset=0x10208000, length=0x7df8000, type=3 }, { offset=0x18000000, length=0x8000, type=0 }, { offset=0x18008000, length=0x38000, type=3 }, { offset=0x18040000, length=0x208000, type=0 }, { offset=0x18248000, length=0x7db8000, type=3 }, { offset=0x20000000, length=0x208000, type=0 }, { offset=0x20208000, length=0x7df8000, type=3 }, { offset=0x28000000, length=0x8000, type=0 }, { offset=0x28008000, length=0x38000, type=3 }, { offset=0x28040000, length=0x208000, type=0 }, { offset=0x28248000, length=0x7db8000, type=3 }, { offset=0x30000000, length=0x208000, type=0 }, { offset=0x30208000, length=0x7df8000, type=3 }, { offset=0x38000000, length=0x8000, type=0 }, { offset=0x38008000, length=0x38000, type=3 }, { offset=0x38040000, length=0x208000, type=0 }, { offset=0x38248000, length=0x7da8000, type=3 }, { offset=0x3fff0000, length=0x10000, type=0 }] return=0 As you can see this is logging the complete information as generated by the underlying plugin, not what is returned to the client (which is rather hard for the log filter to discern). It's probably more useful for debugging like this. --- filters/log/nbdkit-log-filter.pod | 6 ++--- filters/log/log.c | 45 +++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 3 deletions(-) diff --git a/filters/log/nbdkit-log-filter.pod b/filters/log/nbdkit-log-filter.pod index cff209e..e391046 100644 --- a/filters/log/nbdkit-log-filter.pod +++ b/filters/log/nbdkit-log-filter.pod @@ -46,9 +46,9 @@ the impact of the caching. This filter writes to the file specified by the C<logfile=FILE> parameter. All lines include a timestamp, a connection counter, then details about the command. The following actions are logged: Connect, -Read, Write, Zero, Trim, Flush, and Disconnect. Except for Connect -and Disconnect, an event is logged across two lines for call and -return value, to allow tracking duration and tracing any parallel +Read, Write, Zero, Trim, Extents, Flush, and Disconnect. Except for +Connect and Disconnect, an event is logged across two lines for call +and return value, to allow tracking duration and tracing any parallel execution, using id for correlation (incremented per action on the connection). diff --git a/filters/log/log.c b/filters/log/log.c index 9f2e37d..b31d784 100644 --- a/filters/log/log.c +++ b/filters/log/log.c @@ -353,6 +353,50 @@ log_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return r; } +/* Extents. */ +static int +log_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct handle *h = handle; + uint64_t id = get_id (h); + int r; + + assert (!(flags & ~(NBDKIT_FLAG_REQ_ONE))); + output (h, "Extents", id, + "offset=0x%" PRIx64 " count=0x%x req_one=%d ...", + offs, count, !!(flags & NBDKIT_FLAG_REQ_ONE)); + r = next_ops->extents (nxdata, count, offs, flags, extents, err); + if (r == -1) + output_return (h, "...Extents", id, r, err); + else { + FILE *fp; + char *extents_str = NULL; + size_t i, n, len = 0; + + fp = open_memstream (&extents_str, &len); + if (fp != NULL) { + n = nbdkit_extents_count (extents); + for (i = 0; i < n; ++i) { + struct nbdkit_extent e = nbdkit_get_extent (extents, i); + if (i > 0) + fprintf (fp, ", "); + fprintf (fp, "{ offset=0x%" PRIx64 ", length=0x%" PRIx64 ", " + "type=%" PRIu32 " }", + e.offset, e.length, e.type); + } + + fclose (fp); + } + + output (h, "...Extents", id, "extents=[%s] return=0", + extents_str ? extents_str : "(null)"); + free (extents_str); + } + return r; +} + static struct nbdkit_filter filter = { .name = "log", .longname = "nbdkit log filter", @@ -370,6 +414,7 @@ static struct nbdkit_filter filter = { .flush = log_flush, .trim = log_trim, .zero = log_zero, + .extents = log_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 10/19] offset: Implement mapping of extents.
Allows you to safely use nbdkit-offset-filter on top of a plugin supporting extents. --- filters/offset/offset.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/filters/offset/offset.c b/filters/offset/offset.c index 058571d..f949f6d 100644 --- a/filters/offset/offset.c +++ b/filters/offset/offset.c @@ -37,6 +37,7 @@ #include <stdlib.h> #include <stdint.h> #include <string.h> +#include <assert.h> #include <nbdkit-filter.h> @@ -132,6 +133,40 @@ offset_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->zero (nxdata, count, offs + offset, flags, err); } +/* Extents. */ +static int +offset_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + size_t i; + struct nbdkit_extents *extents2; + struct nbdkit_extent e; + int64_t real_size = next_ops->get_size (nxdata); + + extents2 = nbdkit_extents_new (offs + offset, real_size - offset); + if (extents2 == NULL) { + *err = errno; + return -1; + } + if (next_ops->extents (nxdata, count, offs + offset, + flags, extents2, err) == -1) + goto error; + + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + e.offset -= offset; + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) + goto error; + } + nbdkit_extents_free (extents2); + return 0; + + error: + nbdkit_extents_free (extents2); + return -1; +} + static struct nbdkit_filter filter = { .name = "offset", .longname = "nbdkit offset filter", @@ -144,6 +179,7 @@ static struct nbdkit_filter filter = { .pwrite = offset_pwrite, .trim = offset_trim, .zero = offset_zero, + .extents = offset_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 11/19] partition: Implement mapping of extents.
--- filters/partition/partition.c | 37 +++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/filters/partition/partition.c b/filters/partition/partition.c index c1fe7a1..bc60480 100644 --- a/filters/partition/partition.c +++ b/filters/partition/partition.c @@ -38,6 +38,7 @@ #include <stdint.h> #include <string.h> #include <inttypes.h> +#include <assert.h> #include <nbdkit-filter.h> @@ -222,6 +223,41 @@ partition_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->zero (nxdata, count, offs + h->offset, flags, err); } +/* Extents. */ +static int +partition_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct handle *h = handle; + size_t i; + struct nbdkit_extents *extents2; + struct nbdkit_extent e; + int64_t real_size = next_ops->get_size (nxdata); + + extents2 = nbdkit_extents_new (offs + h->offset, real_size - h->offset); + if (extents2 == NULL) { + *err = errno; + return -1; + } + if (next_ops->extents (nxdata, count, offs + h->offset, + flags, extents2, err) == -1) + goto error; + + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + e.offset -= h->offset; + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) + goto error; + } + nbdkit_extents_free (extents2); + return 0; + + error: + nbdkit_extents_free (extents2); + return -1; +} + static struct nbdkit_filter filter = { .name = "partition", .longname = "nbdkit partition filter", @@ -237,6 +273,7 @@ static struct nbdkit_filter filter = { .pwrite = partition_pwrite, .trim = partition_trim, .zero = partition_zero, + .extents = partition_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 12/19] truncate: Implement extents for beyond end of truncated region.
--- filters/truncate/truncate.c | 55 +++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/filters/truncate/truncate.c b/filters/truncate/truncate.c index b95432a..e8e56f7 100644 --- a/filters/truncate/truncate.c +++ b/filters/truncate/truncate.c @@ -285,6 +285,60 @@ truncate_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } +/* Extents. */ +static int +truncate_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents, int *err) +{ + uint32_t n; + uint64_t real_size_copy; + struct nbdkit_extents *extents2; + size_t i; + + pthread_mutex_lock (&lock); + real_size_copy = real_size; + pthread_mutex_unlock (&lock); + + /* If the entire request is beyond the end of the underlying plugin + * then this is the easy case: return a hole. + */ + if (offset >= real_size_copy) + return nbdkit_add_extent (extents, offset, (uint64_t) count, + NBDKIT_EXTENT_ZERO|NBDKIT_EXTENT_HOLE); + + /* We're asked first for extents information about the plugin, then + * possibly (if truncating larger) for the hole after the plugin. + * Since we're not required to provide all of this information, the + * easiest thing is to only return data from the plugin. We will be + * called later about the hole. However we do need to make sure + * that the extents array is truncated to the real size, hence we + * have to create a new extents array, ask the plugin, then copy the + * returned data to the original array. + */ + extents2 = nbdkit_extents_new (0, real_size_copy); + if (offset + count <= real_size_copy) + n = count; + else + n = real_size_copy - offset; + if (next_ops->extents (nxdata, n, offset, flags, extents2, err) == -1) { + nbdkit_extents_free (extents2); + return -1; + } + + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { + struct nbdkit_extent e = nbdkit_get_extent (extents2, i); + + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) { + nbdkit_extents_free (extents2); + return -1; + } + } + nbdkit_extents_free (extents2); + + return 0; +} + static struct nbdkit_filter filter = { .name = "truncate", .longname = "nbdkit truncate filter", @@ -297,6 +351,7 @@ static struct nbdkit_filter filter = { .pwrite = truncate_pwrite, .trim = truncate_trim, .zero = truncate_zero, + .extents = truncate_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 13/19] xz: Disable extents information in this filter.
However we ought to support this in future because xz files contain sparseness information, so add a note. --- filters/xz/xz.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/filters/xz/xz.c b/filters/xz/xz.c index ec41f25..3013538 100644 --- a/filters/xz/xz.c +++ b/filters/xz/xz.c @@ -185,6 +185,16 @@ xz_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } +/* Similar to above. However xz files themselves do support + * sparseness so in future we should generate extents information. XXX + */ +static int +xz_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return 0; +} + /* Read data from the file. */ static int xz_pread (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -238,6 +248,7 @@ static struct nbdkit_filter filter = { .prepare = xz_prepare, .get_size = xz_get_size, .can_write = xz_can_write, + .can_extents = xz_can_extents, .pread = xz_pread, }; -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 14/19] data, memory: Implement extents.
These plugins are both based on the same sparse array structure which supports a simple implementation of extents. --- common/sparse/sparse.h | 7 +- common/sparse/sparse.c | 37 ++++++++++- plugins/data/data.c | 16 ++++- plugins/memory/memory.c | 16 ++++- README | 2 + tests/Makefile.am | 2 + tests/test-data-extents.sh | 131 +++++++++++++++++++++++++++++++++++++ 7 files changed, 207 insertions(+), 4 deletions(-) diff --git a/common/sparse/sparse.h b/common/sparse/sparse.h index 818d804..eb24a0b 100644 --- a/common/sparse/sparse.h +++ b/common/sparse/sparse.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -87,4 +87,9 @@ extern void sparse_array_zero (struct sparse_array *sa, uint32_t count, uint64_t offset) __attribute__((__nonnull__ (1))); +/* Return information about allocated pages and holes. */ +extern int sparse_array_extents (struct sparse_array *sa, + uint32_t count, uint64_t offset, + struct nbdkit_extents *extents); + #endif /* NBDKIT_SPARSE_H */ diff --git a/common/sparse/sparse.c b/common/sparse/sparse.c index a5ace48..d5654c2 100644 --- a/common/sparse/sparse.c +++ b/common/sparse/sparse.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -360,3 +360,38 @@ sparse_array_zero (struct sparse_array *sa, uint32_t count, uint64_t offset) offset += n; } } + +int +sparse_array_extents (struct sparse_array *sa, + uint32_t count, uint64_t offset, + struct nbdkit_extents *extents) +{ + uint32_t n, type; + void *p; + + while (count > 0) { + p = lookup (sa, offset, false, &n, NULL); + if (n > count) + n = count; + + /* Work out the type of this extent. */ + if (p == NULL) + /* No backing page, so it's a hole. */ + type = NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO; + else { + if (is_zero (p, n)) + /* A backing page and it's all zero, it's a zero extent. */ + type = NBDKIT_EXTENT_ZERO; + else + /* Normal allocated data. */ + type = 0; + } + if (nbdkit_add_extent (extents, offset, n, type) == -1) + return -1; + + count -= n; + offset += n; + } + + return 0; +} diff --git a/plugins/data/data.c b/plugins/data/data.c index f9d3881..c5540f6 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -378,6 +378,19 @@ data_trim (void *handle, uint32_t count, uint64_t offset) return 0; } +/* Extents. */ +static int +data_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + int r; + + pthread_mutex_lock (&lock); + r = sparse_array_extents (sa, count, offset, extents); + pthread_mutex_unlock (&lock); + return r; +} + static struct nbdkit_plugin plugin = { .name = "data", .version = PACKAGE_VERSION, @@ -394,6 +407,7 @@ static struct nbdkit_plugin plugin = { .pwrite = data_pwrite, .zero = data_zero, .trim = data_trim, + .extents = data_extents, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index e27e127..741eaad 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -170,6 +170,19 @@ memory_trim (void *handle, uint32_t count, uint64_t offset) return 0; } +/* Extents. */ +static int +memory_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + int r; + + pthread_mutex_lock (&lock); + r = sparse_array_extents (sa, count, offset, extents); + pthread_mutex_unlock (&lock); + return r; +} + static struct nbdkit_plugin plugin = { .name = "memory", .version = PACKAGE_VERSION, @@ -185,6 +198,7 @@ static struct nbdkit_plugin plugin = { .pwrite = memory_pwrite, .zero = memory_zero, .trim = memory_trim, + .extents = memory_extents, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/README b/README index 75230c9..497f4fd 100644 --- a/README +++ b/README @@ -152,6 +152,8 @@ To test for memory leaks (‘make check-valgrind’): For non-essential enhancements to the test suite: + - jq + - ip, ss (from iproute package) - losetup (from util-linux package) diff --git a/tests/Makefile.am b/tests/Makefile.am index dec44b5..174da29 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -53,6 +53,7 @@ EXTRA_DIST = \ test-cxx.sh \ test-data-7E.sh \ test-data-base64.sh \ + test-data-extents.sh \ test-data-file.sh \ test-data-raw.sh \ test-debug-flags.sh \ @@ -372,6 +373,7 @@ LIBGUESTFS_TESTS += test-data TESTS += \ test-data-7E.sh \ test-data-base64.sh \ + test-data-extents.sh \ test-data-file.sh \ test-data-raw.sh diff --git a/tests/test-data-extents.sh b/tests/test-data-extents.sh new file mode 100755 index 0000000..2b25bde --- /dev/null +++ b/tests/test-data-extents.sh @@ -0,0 +1,131 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2019 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Unfortunately the output of this test depends on the PAGE_SIZE +# defined in common/sparse/sparse.c and would change (breaking the +# test) if we ever changed that definition. + +source ./functions.sh +set -e +set -x + +requires jq --version +requires qemu-img --version +requires qemu-img map --help + +out="test-data-extents.out" +expected="test-data-extents.expected" +files="$out $expected" +rm -f $files +cleanup_fn rm -f $files + +do_test () +{ + # We use jq to normalize the output and convert it to plain text. + nbdkit -U - data data="$1" size="$2" \ + --run 'qemu-img map -f raw --output=json $nbd' | + jq -c '.[] | {start:.start, length:.length, data:.data, zero:.zero}' \ + > $out + if ! cmp $out $expected; then + echo "$0: output did not match expected data" + echo "expected:" + cat $expected + echo "output:" + cat $out + exit 1 + fi +} + +# Completely sparse disk. +cat > $expected <<'EOF' +{"start":0,"length":1048576,"data":false,"zero":true} +EOF +do_test "" 1M + +# Completely allocated disk. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":true,"zero":false} +EOF +do_test "1" 32K + +# Allocated data at the start of a 1M disk. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":true,"zero":false} +{"start":32768,"length":1015808,"data":false,"zero":true} +EOF +do_test "1" 1M + +# Allocated zeroes at the start of the disk. This should create a +# zero extent (data=true zero=true). +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":true,"zero":true} +{"start":32768,"length":1015808,"data":false,"zero":true} +EOF +do_test "0" 1M + +# Completely zero disk. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":true,"zero":true} +EOF +do_test "0 0 0" 32K + +# Allocated data and zero extents in a few places in the middle. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":false,"zero":true} +{"start":32768,"length":32768,"data":true,"zero":false} +{"start":65536,"length":32768,"data":true,"zero":true} +{"start":98304,"length":32768,"data":false,"zero":true} +{"start":131072,"length":32768,"data":true,"zero":false} +{"start":163840,"length":98304,"data":false,"zero":true} +{"start":262144,"length":32768,"data":true,"zero":true} +{"start":294912,"length":32768,"data":true,"zero":false} +{"start":327680,"length":720896,"data":false,"zero":true} +EOF +do_test "@32768 1 @65536 0 @131072 1 @262144 0 @294912 1" 1M + +# This should test coalescing in the extents code. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":false,"zero":true} +{"start":32768,"length":65536,"data":true,"zero":false} +{"start":98304,"length":32768,"data":false,"zero":true} +{"start":131072,"length":65536,"data":true,"zero":true} +{"start":196608,"length":851968,"data":false,"zero":true} +EOF +do_test "@32768 1 @65536 1 @131072 0 @163840 0" 1M + +# Zero-length plugin. Unlike nbdkit-zero-plugin, the data plugin +# advertises extents and so will behave differently. +cat > $expected <<'EOF' +{"start":0,"length":0,"data":false,"zero":false} +EOF +do_test "" 0 -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 15/19] file: Implement extents.
This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and holes in the underlying file. --- plugins/file/file.c | 141 ++++++++++++++++++++++++++++++++++--- tests/Makefile.am | 5 ++ tests/test-file-extents.sh | 57 +++++++++++++++ 3 files changed, 193 insertions(+), 10 deletions(-) diff --git a/plugins/file/file.c b/plugins/file/file.c index 628f8fb..2fbc2a3 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -37,6 +37,7 @@ #include <stdlib.h> #include <stdbool.h> #include <string.h> +#include <inttypes.h> #include <fcntl.h> #include <unistd.h> #include <sys/types.h> @@ -44,6 +45,8 @@ #include <sys/ioctl.h> #include <errno.h> +#include <pthread.h> + #if defined(__linux__) && !defined(FALLOC_FL_PUNCH_HOLE) #include <linux/falloc.h> /* For FALLOC_FL_*, glibc < 2.18 */ #endif @@ -68,7 +71,11 @@ static char *filename = NULL; -int file_debug_zero; /* to enable: -D file.zero=1 */ +/* Any callbacks using lseek must be protected by this lock. */ +static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER; + +/* to enable: -D file.zero=1 */ +int file_debug_zero; static void file_unload (void) @@ -220,6 +227,23 @@ file_close (void *handle) #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL +/* For block devices, stat->st_size is not the true size. The caller + * grabs the lseek_lock. + */ +static int64_t +block_device_size (int fd) +{ + off_t size; + + size = lseek (fd, 0, SEEK_END); + if (size == -1) { + nbdkit_error ("lseek (to find device size): %m"); + return -1; + } + + return size; +} + /* Get the file size. */ static int64_t file_get_size (void *handle) @@ -227,15 +251,11 @@ file_get_size (void *handle) struct handle *h = handle; if (h->is_block_device) { - /* Block device, so st_size will not be the true size. */ - off_t size; - - size = lseek (h->fd, 0, SEEK_END); - if (size == -1) { - nbdkit_error ("lseek (to find device size): %m"); - return -1; - } + int64_t size; + pthread_mutex_lock (&lseek_lock); + size = block_device_size (h->fd); + pthread_mutex_unlock (&lseek_lock); return size; } else { /* Regular file. */ @@ -501,6 +521,103 @@ file_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) return 0; } +#ifdef SEEK_HOLE +/* Extents. */ + +static int +file_can_extents (void *handle) +{ + struct handle *h = handle; + off_t r; + + /* A simple test to see whether SEEK_HOLE etc is likely to work on + * the current filesystem. + */ + pthread_mutex_lock (&lseek_lock); + r = lseek (h->fd, 0, SEEK_HOLE); + pthread_mutex_unlock (&lseek_lock); + if (r == -1) { + nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m"); + return 0; + } + return 1; +} + +static int +do_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + struct handle *h = handle; + const bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + uint64_t end = offset + count; + + do { + off_t pos; + + pos = lseek (h->fd, offset, SEEK_DATA); + if (pos == -1) { + if (errno == ENXIO) { + /* The current man page does not describe this situation well, + * but a proposed change to POSIX adds these words for ENXIO: + * "or the whence argument is SEEK_DATA and the offset falls + * within the final hole of the file." + */ + pos = end; + } + else { + nbdkit_error ("lseek: SEEK_DATA: %" PRIu64 ": %m", offset); + return -1; + } + } + + /* We know there is a hole from offset to pos-1. */ + if (pos > offset) { + if (nbdkit_add_extent (extents, offset, pos - offset, + NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO) == -1) + return -1; + if (req_one) + break; + } + + offset = pos; + if (offset >= end) + break; + + pos = lseek (h->fd, offset, SEEK_HOLE); + if (pos == -1) { + nbdkit_error ("lseek: SEEK_HOLE: %" PRIu64 ": %m", offset); + return -1; + } + + /* We know there is data from offset to pos-1. */ + if (pos > offset) { + if (nbdkit_add_extent (extents, offset, pos - offset, + 0 /* allocated data */) == -1) + return -1; + if (req_one) + break; + } + + offset = pos; + } while (offset < end); + + return 0; +} + +static int +file_extents (void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents) +{ + int r; + + pthread_mutex_lock (&lseek_lock); + r = do_extents (handle, count, offset, flags, extents); + pthread_mutex_unlock (&lseek_lock); + + return r; +} +#endif /* SEEK_HOLE */ + static struct nbdkit_plugin plugin = { .name = "file", .longname = "nbdkit file plugin", @@ -522,6 +639,10 @@ static struct nbdkit_plugin plugin = { .flush = file_flush, .trim = file_trim, .zero = file_zero, +#ifdef SEEK_HOLE + .can_extents = file_can_extents, + .extents = file_extents, +#endif .errno_is_preserved = 1, }; diff --git a/tests/Makefile.am b/tests/Makefile.am index 174da29..64b2c45 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -64,6 +64,7 @@ EXTRA_DIST = \ test-error0.sh \ test-error10.sh \ test-error100.sh \ + test-file-extents.sh \ test-floppy.sh \ test-foreground.sh \ test-fua.sh \ @@ -419,6 +420,10 @@ test_file_block_SOURCES = test-file-block.c test.h test_file_block_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) test_file_block_LDADD = libtest.la $(LIBGUESTFS_LIBS) +if HAVE_GUESTFISH +TESTS += test-file-extents.sh +endif HAVE_GUESTFISH + # floppy plugin test. if HAVE_GUESTFISH TESTS += test-floppy.sh diff --git a/tests/test-file-extents.sh b/tests/test-file-extents.sh new file mode 100755 index 0000000..1f33c88 --- /dev/null +++ b/tests/test-file-extents.sh @@ -0,0 +1,57 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2019 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# The output of qemu-img map on the local sparse file ‘disk’ should be +# identical to the output when we use the file plugin to read the +# extents. + +source ./functions.sh +set -e +set -x + +requires jq --version +requires qemu-img --version +requires qemu-img map --help + +files="test-file-extents.out test-file-extents.expected" +rm -f $files +cleanup_fn rm -f $files + +qemu-img map -f raw --output=json disk | + jq -c '.[] | {start:.start, length:.length, data:.data, zero:.zero}' \ + > test-file-extents.out +nbdkit -U - file disk --run 'qemu-img map -f raw --output=json $nbd' | + jq -c '.[] | {start:.start, length:.length, data:.data, zero:.zero}' \ + > test-file-extents.expected + +diff -u test-file-extents.out test-file-extents.expected -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 16/19] vddk: Implement extents.
This uses a new API VixDiskLib_QueryAllocatedBlocks provided in VDDK >= 6.7. Thanks: Martin Kletzander. --- plugins/vddk/vddk-structs.h | 15 +++- plugins/vddk/vddk.c | 169 ++++++++++++++++++++++++++++++++++++ 2 files changed, 183 insertions(+), 1 deletion(-) diff --git a/plugins/vddk/vddk-structs.h b/plugins/vddk/vddk-structs.h index dbed94a..df88322 100644 --- a/plugins/vddk/vddk-structs.h +++ b/plugins/vddk/vddk-structs.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -47,6 +47,9 @@ typedef uint64_t VixError; #define VIXDISKLIB_FLAG_OPEN_READ_ONLY 4 #define VIXDISKLIB_SECTOR_SIZE 512 +#define VIXDISKLIB_MIN_CHUNK_SIZE 128 +#define VIXDISKLIB_MAX_CHUNK_NUMBER (512*1024) + typedef void *VixDiskLibConnection; typedef void *VixDiskLibHandle; @@ -124,4 +127,14 @@ typedef struct VixDiskLibInfo { char *uuid; } VixDiskLibInfo; +typedef struct { + uint64_t offset; + uint64_t length; +} VixDiskLibBlock; + +typedef struct { + uint32_t numBlocks; + VixDiskLibBlock blocks[1]; +} VixDiskLibBlockList; + #endif /* NBDKIT_VDDK_STRUCTS_H */ diff --git a/plugins/vddk/vddk.c b/plugins/vddk/vddk.c index 56871cc..56c5335 100644 --- a/plugins/vddk/vddk.c +++ b/plugins/vddk/vddk.c @@ -45,12 +45,17 @@ #include <nbdkit-plugin.h> #include "isaligned.h" +#include "minmax.h" +#include "rounding.h" #include "vddk-structs.h" /* Enable extra disk info debugging with: -D vddk.diskinfo=1 */ int vddk_debug_diskinfo; +/* Enable debugging of extents code with: -D vddk.extents=1 */ +int vddk_debug_extents; + /* The VDDK APIs that we call. These globals are initialized when the * plugin is loaded (by vddk_load). */ @@ -67,6 +72,8 @@ static VixError (*VixDiskLib_GetInfo) (VixDiskLibHandle handle, VixDiskLibInfo * static void (*VixDiskLib_FreeInfo) (VixDiskLibInfo *info); static VixError (*VixDiskLib_Read) (VixDiskLibHandle handle, uint64_t start_sector, uint64_t nr_sectors, unsigned char *buf); static VixError (*VixDiskLib_Write) (VixDiskLibHandle handle, uint64_t start_sector, uint64_t nr_sectors, const unsigned char *buf); +static VixError (*VixDiskLib_QueryAllocatedBlocks) (VixDiskLibHandle diskHandle, uint64_t start_sector, uint64_t nr_sectors, uint64_t chunk_size, VixDiskLibBlockList **block_list); +static VixError (*VixDiskLib_FreeBlockList) (VixDiskLibBlockList *block_list); /* Parameters passed to InitEx. */ #define VDDK_MAJOR 5 @@ -174,6 +181,11 @@ vddk_load (void) VixDiskLib_FreeInfo = dlsym (dl, "VixDiskLib_FreeInfo"); VixDiskLib_Read = dlsym (dl, "VixDiskLib_Read"); VixDiskLib_Write = dlsym (dl, "VixDiskLib_Write"); + + /* Added in VDDK 6.7, these will be NULL for earlier versions: */ + VixDiskLib_QueryAllocatedBlocks + dlsym (dl, "VixDiskLib_QueryAllocatedBlocks"); + VixDiskLib_FreeBlockList = dlsym (dl, "VixDiskLib_FreeBlockList"); } static void @@ -570,6 +582,161 @@ vddk_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) return 0; } +static int +vddk_can_extents (void *handle) +{ + struct vddk_handle *h = handle; + VixError err; + VixDiskLibBlockList *block_list; + + /* This call was added in VDDK 6.7. In earlier versions the + * function pointer will be NULL and we cannot query extents. + */ + if (VixDiskLib_QueryAllocatedBlocks == NULL) { + nbdkit_debug ("can_extents: VixDiskLib_QueryAllocatedBlocks == NULL, " + "probably this is VDDK < 6.7"); + return 0; + } + + /* However even when the call is available it rarely works well so + * the best thing we can do here is to try the call and if it's + * non-functional return false. + */ + DEBUG_CALL ("VixDiskLib_QueryAllocatedBlocks", + "handle, 0, %d sectors, %d sectors", + VIXDISKLIB_MIN_CHUNK_SIZE, VIXDISKLIB_MIN_CHUNK_SIZE); + err = VixDiskLib_QueryAllocatedBlocks (h->handle, + 0, VIXDISKLIB_MIN_CHUNK_SIZE, + VIXDISKLIB_MIN_CHUNK_SIZE, + &block_list); + if (err == VIX_OK) { + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); + VixDiskLib_FreeBlockList (block_list); + } + if (err != VIX_OK) { + char *errmsg = VixDiskLib_GetErrorText (err, NULL); + nbdkit_debug ("can_extents: VixDiskLib_QueryAllocatedBlocks test failed, " + "extents support will be disabled: " + "original error: %s", + errmsg); + VixDiskLib_FreeErrorText (errmsg); + return 0; + } + + return 1; +} + +static int +add_extent (struct nbdkit_extents *extents, + uint64_t *position, uint64_t next_position, bool is_hole) +{ + const uint32_t type = is_hole ? NBDKIT_EXTENT_HOLE|NBDKIT_EXTENT_ZERO : 0; + const uint64_t length = next_position - *position; + + assert (*position <= next_position); + if (*position == next_position) + return 0; + + if (vddk_debug_extents) + nbdkit_debug ("adding extent type %s at [%" PRIu64 "...%" PRIu64 "]", + is_hole ? "hole" : "allocated data", + *position, next_position-1); + if (nbdkit_add_extent (extents, *position, length, type) == -1) + return -1; + + *position = next_position; + return 0; +} + +static int +vddk_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents) +{ + struct vddk_handle *h = handle; + bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + uint64_t position, end, start_sector; + + position = offset; + end = offset + count; + + /* We can only query whole chunks. Therefore start with the first + * chunk before offset. + */ + start_sector + ROUND_DOWN (offset, VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE) + / VIXDISKLIB_SECTOR_SIZE; + while (start_sector * VIXDISKLIB_SECTOR_SIZE < end) { + VixError err; + uint32_t i; + uint64_t nr_chunks, nr_sectors; + VixDiskLibBlockList *block_list; + + assert (IS_ALIGNED (start_sector, VIXDISKLIB_MIN_CHUNK_SIZE)); + + nr_chunks + ROUND_UP (end - start_sector * VIXDISKLIB_SECTOR_SIZE, + VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE) + / (VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE); + nr_chunks = MIN (nr_chunks, VIXDISKLIB_MAX_CHUNK_NUMBER); + nr_sectors = nr_chunks * VIXDISKLIB_MIN_CHUNK_SIZE; + + DEBUG_CALL ("VixDiskLib_QueryAllocatedBlocks", + "handle, %" PRIu64 " sectors, %" PRIu64 " sectors, " + "%d sectors", + start_sector, nr_sectors, VIXDISKLIB_MIN_CHUNK_SIZE); + err = VixDiskLib_QueryAllocatedBlocks (h->handle, + start_sector, nr_sectors, + VIXDISKLIB_MIN_CHUNK_SIZE, + &block_list); + if (err != VIX_OK) { + VDDK_ERROR (err, "VixDiskLib_QueryAllocatedBlocks"); + return -1; + } + + for (i = 0; i < block_list->numBlocks; ++i) { + uint64_t blk_offset, blk_length; + + blk_offset = block_list->blocks[i].offset * VIXDISKLIB_SECTOR_SIZE; + blk_length = block_list->blocks[i].length * VIXDISKLIB_SECTOR_SIZE; + + /* The query returns allocated blocks. We must insert holes + * between the blocks as necessary. + */ + if (position < blk_offset && + add_extent (extents, &position, blk_offset, true) == -1) + goto error_in_add; + if (add_extent (extents, + &position, blk_offset + blk_length, false) == -1) { + error_in_add: + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); + VixDiskLib_FreeBlockList (block_list); + return -1; + } + } + DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list"); + VixDiskLib_FreeBlockList (block_list); + + /* There's an implicit hole after the returned list of blocks, up + * to the end of the QueryAllocatedBlocks request. + */ + if (add_extent (extents, + &position, + (start_sector + nr_sectors) * VIXDISKLIB_SECTOR_SIZE, + true) == -1) + return -1; + + start_sector += nr_sectors; + + /* If one extent was requested, as long as we've added an extent + * overlapping the original offset we're done. + */ + if (req_one && position > offset) + break; + } + + return 0; +} + static struct nbdkit_plugin plugin = { .name = "vddk", .longname = "VMware VDDK plugin", @@ -585,6 +752,8 @@ static struct nbdkit_plugin plugin = { .get_size = vddk_get_size, .pread = vddk_pread, .pwrite = vddk_pwrite, + .can_extents = vddk_can_extents, + .extents = vddk_extents, }; NBDKIT_REGISTER_PLUGIN(plugin) -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 17/19] tests: offset: Add a test of extent handling in the offset filter.
--- tests/Makefile.am | 5 +- tests/test-offset-extents.sh | 99 ++++++++++++++++++++++++++++++++++++ 2 files changed, 103 insertions(+), 1 deletion(-) diff --git a/tests/Makefile.am b/tests/Makefile.am index 64b2c45..75561b2 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -84,6 +84,7 @@ EXTRA_DIST = \ test_ocaml_plugin.ml \ test-ocaml.c \ test-offset2.sh \ + test-offset-extents.sh \ test-parallel-file.sh \ test-parallel-nbd.sh \ test-partition1.sh \ @@ -863,7 +864,9 @@ test_offset_SOURCES = test-offset.c test.h test_offset_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) test_offset_LDADD = libtest.la $(LIBGUESTFS_LIBS) -TESTS += test-offset2.sh +TESTS += \ + test-offset2.sh \ + test-offset-extents.sh # partition filter test. TESTS += test-partition1.sh diff --git a/tests/test-offset-extents.sh b/tests/test-offset-extents.sh new file mode 100755 index 0000000..60fbb47 --- /dev/null +++ b/tests/test-offset-extents.sh @@ -0,0 +1,99 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2019 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Unfortunately the output of this test depends on the PAGE_SIZE +# defined in common/sparse/sparse.c and would change (breaking the +# test) if we ever changed that definition. + +source ./functions.sh +set -e +set -x + +requires jq --version +requires qemu-img --version +requires qemu-img map --help + +out="test-offset-extents.out" +expected="test-offset-extents.expected" +files="$out $expected" +rm -f $files +cleanup_fn rm -f $files + +do_test () +{ + # We use jq to normalize the output and convert it to plain text. + nbdkit -U - \ + --filter=offset \ + data data="$1" size="$2" \ + offset=1024 range=65536 \ + --run 'qemu-img map -f raw --output=json $nbd' | + jq -c '.[] | {start:.start, length:.length, data:.data, zero:.zero}' \ + > $out + if ! cmp $out $expected; then + echo "$0: output did not match expected data" + echo "expected:" + cat $expected + echo "output:" + cat $out + exit 1 + fi +} + +# Completely sparse disk. +cat > $expected <<'EOF' +{"start":0,"length":65536,"data":false,"zero":true} +EOF +do_test "" 1M + +# Completely allocated disk. +# +# We have to write a non-zero byte @32767 so that the sparse array +# doesn't detect the range from 1024-32767 as being all zeroes. +cat > $expected <<'EOF' +{"start":0,"length":65536,"data":true,"zero":false} +EOF +do_test "1 @32767 1 @32768 1 @65536 1 @98304 1" 128K + +# Allocated data at the start of a 1M disk. +cat > $expected <<'EOF' +{"start":0,"length":31744,"data":true,"zero":false} +{"start":31744,"length":33792,"data":false,"zero":true} +EOF +do_test "1 @32767 1" 1M + +# Allocated zeroes at the start of the disk. +cat > $expected <<'EOF' +{"start":0,"length":31744,"data":true,"zero":true} +{"start":31744,"length":33792,"data":false,"zero":true} +EOF +do_test "0" 1M -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 18/19] tests: Update layers test for can_extents.
Testing Block Status / extents is however rather more difficult, so this commit punts on it and just adds a comment to the code. --- tests/test-layers-filter.c | 21 +++++++++++++++++++++ tests/test-layers-plugin.c | 20 +++++++++++++++++++- tests/test-layers.c | 12 ++++++++++++ 3 files changed, 52 insertions(+), 1 deletion(-) diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c index 6da6ee6..48ca8e6 100644 --- a/tests/test-layers-filter.c +++ b/tests/test-layers-filter.c @@ -176,6 +176,15 @@ test_layers_filter_can_multi_conn (struct nbdkit_next_ops *next_ops, return next_ops->can_multi_conn (nxdata); } +static int +test_layers_filter_can_extents (struct nbdkit_next_ops *next_ops, + void *nxdata, + void *handle) +{ + DEBUG_FUNCTION; + return next_ops->can_extents (nxdata); +} + static int test_layers_filter_pread (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, @@ -223,6 +232,16 @@ test_layers_filter_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->zero (nxdata, count, offset, flags, err); } +static int +test_layers_filter_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, + uint32_t flags, struct nbdkit_extents *extents, + int *err) +{ + DEBUG_FUNCTION; + return next_ops->extents (nxdata, count, offset, flags, extents, err); +} + static struct nbdkit_filter filter = { .name = "testlayers" layer, .version = PACKAGE_VERSION, @@ -243,11 +262,13 @@ static struct nbdkit_filter filter = { .can_zero = test_layers_filter_can_zero, .can_fua = test_layers_filter_can_fua, .can_multi_conn = test_layers_filter_can_multi_conn, + .can_extents = test_layers_filter_can_extents, .pread = test_layers_filter_pread, .pwrite = test_layers_filter_pwrite, .flush = test_layers_filter_flush, .trim = test_layers_filter_trim, .zero = test_layers_filter_zero, + .extents = test_layers_filter_extents, }; NBDKIT_REGISTER_FILTER(filter) diff --git a/tests/test-layers-plugin.c b/tests/test-layers-plugin.c index 042ecc3..c33d12d 100644 --- a/tests/test-layers-plugin.c +++ b/tests/test-layers-plugin.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -144,6 +144,13 @@ test_layers_plugin_can_multi_conn (void *handle) return 1; } +static int +test_layers_plugin_can_extents (void *handle) +{ + DEBUG_FUNCTION; + return 1; +} + static int test_layers_plugin_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -186,6 +193,15 @@ test_layers_plugin_zero (void *handle, return 0; } +static int +test_layers_plugin_extents (void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents) +{ + DEBUG_FUNCTION; + return 0; +} + static struct nbdkit_plugin plugin = { .name = "testlayersplugin", .version = PACKAGE_VERSION, @@ -204,11 +220,13 @@ static struct nbdkit_plugin plugin = { .can_zero = test_layers_plugin_can_zero, .can_fua = test_layers_plugin_can_fua, .can_multi_conn = test_layers_plugin_can_multi_conn, + .can_extents = test_layers_plugin_can_extents, .pread = test_layers_plugin_pread, .pwrite = test_layers_plugin_pwrite, .flush = test_layers_plugin_flush, .trim = test_layers_plugin_trim, .zero = test_layers_plugin_zero, + .extents = test_layers_plugin_extents, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/tests/test-layers.c b/tests/test-layers.c index 96df9a6..7757f86 100644 --- a/tests/test-layers.c +++ b/tests/test-layers.c @@ -355,6 +355,12 @@ main (int argc, char *argv[]) "filter1: test_layers_filter_can_multi_conn", "test_layers_plugin_can_multi_conn", NULL); + log_verify_seen_in_order + ("filter3: test_layers_filter_can_extents", + "filter2: test_layers_filter_can_extents", + "filter1: test_layers_filter_can_extents", + "test_layers_plugin_can_extents", + NULL); fprintf (stderr, "%s: protocol connected\n", program_name); @@ -520,6 +526,12 @@ main (int argc, char *argv[]) "test_layers_plugin_zero", NULL); + /* XXX We should test NBD_CMD_BLOCK_STATUS here. However it + * requires that we negotiate structured replies and base:allocation + * in the handshake, and the format of the reply is more complex + * than what we expect in this naive test. + */ + /* Close the connection. */ fprintf (stderr, "%s: closing the connection\n", program_name); request.type = htobe16 (NBD_CMD_DISC); -- 2.20.1
Richard W.M. Jones
2019-Mar-28 16:18 UTC
[Libguestfs] [PATCH nbdkit v5 FINAL 19/19] tests: truncate: Add a test of extent handling in the truncate filter.
--- tests/Makefile.am | 4 +- tests/test-truncate-extents.sh | 106 +++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+), 1 deletion(-) diff --git a/tests/Makefile.am b/tests/Makefile.am index 75561b2..4b56191 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -118,6 +118,7 @@ EXTRA_DIST = \ test-truncate1.sh \ test-truncate2.sh \ test-truncate3.sh \ + test-truncate-extents.sh \ test-vddk.sh \ test-version.sh \ test-version-filter.sh \ @@ -881,7 +882,8 @@ TESTS += test-rate.sh TESTS += \ test-truncate1.sh \ test-truncate2.sh \ - test-truncate3.sh + test-truncate3.sh \ + test-truncate-extents.sh # xz filter test. if HAVE_LIBLZMA diff --git a/tests/test-truncate-extents.sh b/tests/test-truncate-extents.sh new file mode 100755 index 0000000..1f5080e --- /dev/null +++ b/tests/test-truncate-extents.sh @@ -0,0 +1,106 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2019 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Unfortunately the output of this test depends on the PAGE_SIZE +# defined in common/sparse/sparse.c and would change (breaking the +# test) if we ever changed that definition. + +source ./functions.sh +set -e +set -x + +requires jq --version +requires qemu-img --version +requires qemu-img map --help + +out="test-truncate-extents.out" +expected="test-truncate-extents.expected" +files="$out $expected" +rm -f $files +cleanup_fn rm -f $files + +do_test () +{ + # We use jq to normalize the output and convert it to plain text. + nbdkit -U - \ + --filter=truncate \ + data data="$1" size="$2" \ + truncate="$3" \ + --run 'qemu-img map -f raw --output=json $nbd' | + jq -c '.[] | {start:.start, length:.length, data:.data, zero:.zero}' \ + > $out + if ! cmp $out $expected; then + echo "$0: output did not match expected data" + echo "expected:" + cat $expected + echo "output:" + cat $out + exit 1 + fi +} + +# Completely sparse disk. +cat > $expected <<'EOF' +{"start":0,"length":65536,"data":false,"zero":true} +EOF +do_test "" 1M 65536 + +# Completely allocated disk. +cat > $expected <<'EOF' +{"start":0,"length":32768,"data":true,"zero":false} +EOF +do_test "1 @32768 1 @65536 1 @98304 1" 128K 32768 + +#---------------------------------------------------------------------- +# The above are the easy cases. Now let's truncate to a larger +# size which should create a hole at the end. + +# Completely sparse disk. +cat > $expected <<'EOF' +{"start":0,"length":1048576,"data":false,"zero":true} +EOF +do_test "" 65536 1M + +# Completely allocated disk. +cat > $expected <<'EOF' +{"start":0,"length":512,"data":true,"zero":false} +{"start":512,"length":1048064,"data":false,"zero":true} +EOF +do_test "1" 512 1M + +# Zero-length plugin. Unlike nbdkit-zero-plugin, the data plugin +# advertises extents and so will behave differently. +cat > $expected <<'EOF' +{"start":0,"length":0,"data":false,"zero":false} +EOF +do_test "" 0 0 -- 2.20.1
Eric Blake
2019-Mar-29 01:57 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 12/19] truncate: Implement extents for beyond end of truncated region.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> --- > filters/truncate/truncate.c | 55 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 55 insertions(+) >> + > + /* If the entire request is beyond the end of the underlying plugin > + * then this is the easy case: return a hole. > + */ > + if (offset >= real_size_copy) > + return nbdkit_add_extent (extents, offset, (uint64_t) count, > + NBDKIT_EXTENT_ZERO|NBDKIT_EXTENT_HOLE);Why not return a hole up to the end of the file, rather than limiting things at the end of the client's request?> + > + /* We're asked first for extents information about the plugin, then > + * possibly (if truncating larger) for the hole after the plugin. > + * Since we're not required to provide all of this information, the > + * easiest thing is to only return data from the plugin. We will be > + * called later about the hole. However we do need to make sure > + * that the extents array is truncated to the real size, hence we > + * have to create a new extents array, ask the plugin, then copy the > + * returned data to the original array. > + */ > + extents2 = nbdkit_extents_new (0, real_size_copy);Why 0 for start instead of offset? You get the same result either way (since the copying code ignores the prefix), but it's probably a lot more efficient to not have to copy the extents for the prefix of the file. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-29 02:06 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 00/19] Implement extents.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> This has already been pushed upstream. I am simply posting these here > so we have a reference in the mailing list in case we find bugs later > (as I'm sure we will - it's a complex patch series). > > Great thanks to Eric Blake for tireless review on this one. It also > seems to have identified a few minor bugs in qemu along the way.We missed the nozero filter - when using the filter to change whether the client can see sparse handling extensions, we should probably have a way to force the entire image to appear as allocated (and NOT as zero) in order to time the client actually reading things (rather than a client that reads block status and then skips what it knows are holes in spite of us not allowing them to write zeroes).I don't know if it needs another knob independent from zeromode, or if it should just be a third mode, or if we can just wrap that into the existing zeromode=none. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-29 02:15 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 00/19] Implement extents.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> This has already been pushed upstream. I am simply posting these here > so we have a reference in the mailing list in case we find bugs later > (as I'm sure we will - it's a complex patch series). > > Great thanks to Eric Blake for tireless review on this one. It also > seems to have identified a few minor bugs in qemu along the way. >Should nbdkit-rate-filter.pod mention that extents is considered effectively free (same as zeroing and trimming)? I know that I want the nbd plugin to do extents passthrough, but I have some work to do to teach it do to NBD_OPT_GO/structured replies first. The full and null plugins should probably have a trivial .extents that reports completely sparse, all of the time :) Various plugins like partitioning and linuxdisk might be interesting to support extents on (particularly since we know we pad in between files); but it may be trickier to write. Can curl or ssh access extents information? Language bindings will also be an interesting exercise ;) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-29 02:24 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 01/19] server: Implement extents/can_extents calls for plugins and filters.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> This new pair of callbacks allows plugins to describe which extents in > the virtual disk are allocated, holes or zeroes. > ---> +++ b/server/extents.c > @@ -0,0 +1,210 @@ > +/* nbdkit > + * Copyright (C) 2019 Red Hat Inc. > + * All rights reserved.We started work on this phrase but it got waylaid by extents work; maybe now is a good time to revisit that?> +struct nbdkit_extents * > +nbdkit_extents_new (uint64_t start, uint64_t end) > +{ > + struct nbdkit_extents *r; > + > + if (start >= INT64_MAX || end >= INT64_MAX) {Is this an off-by-one? INT64_MAX is a valid offset in our memory plugin, while INT64_MAX + 1 is not.> + nbdkit_error ("nbdkit_extents_new: " > + "start (%" PRIu64 ") or end (%" PRIu64 ") >= INT64_MAX", > + start, end); > + errno = ERANGE; > + return NULL; > + } > + > + /* 0-length ranges are possible, so start == end is not an error. */ > + if (start > end) {The protocol declares them to be unspecified behavior (so a portable client shouldn't be doing it). Doesn't valid_range() in protocol.c already filter out 0-length requests from the client? Thus, I'm wondering if this should be >= instead of >> +int > +nbdkit_add_extent (struct nbdkit_extents *exts, > + uint64_t offset, uint64_t length, uint32_t type) > +{ > + uint64_t overlap; > + > + if (length == 0) > + return 0; > + > + /* Ignore extents beyond the end of the range. Unfortunately we > + * lost the information about whether this is contiguous with a > + * previously added extent so we can't check for API usage errors.Nice comment...> + */ > + if (offset >= exts->end) > + return 0;Is returning 0 always right, or should we also check whether the user tried to pass in offset > end as their very first nbdkit_extents_add (where we know that because nr_extents is 0 that it is an API usage error).> + > + /* Shorten extents that overlap the end of the range. */ > + if (offset + length >= exts->end) { > + overlap = offset + length - exts->end; > + length -= overlap; > + } > + > + /* If there are existing extents, the new extent must be contiguous. */ > + if (exts->nr_extents > 0) { > + const struct nbdkit_extent *ep; > + > + ep = &exts->extents[exts->nr_extents-1]; > + if (offset != ep->offset + ep->length) { > + nbdkit_error ("nbdkit_add_extent: " > + "extents must be added in ascending order and " > + "must be contiguous"); > + return -1; > + } > + } > + else { > + /* If there are no existing extents, and the new extent is > + * entirely before start, ignore it....should this one also mention that we can't check for API usage errors for extents prior to start? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-29 02:39 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 12/19] truncate: Implement extents for beyond end of truncated region.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> --- > filters/truncate/truncate.c | 55 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 55 insertions(+) >> + extents2 = nbdkit_extents_new (0, real_size_copy); > + if (offset + count <= real_size_copy) > + n = count; > + else > + n = real_size_copy - offset; > + if (next_ops->extents (nxdata, n, offset, flags, extents2, err) == -1) { > + nbdkit_extents_free (extents2); > + return -1; > + } > + > + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { > + struct nbdkit_extent e = nbdkit_get_extent (extents2, i); > + > + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) { > + nbdkit_extents_free (extents2); > + return -1; > + } > + } > + nbdkit_extents_free (extents2); > +Should we be using the CLEANUP_EXTENTS_FREE macro here and in other filters? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2019-Mar-29 08:52 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 00/19] Implement extents.
On Thu, Mar 28, 2019 at 09:15:17PM -0500, Eric Blake wrote:> On 3/28/19 11:18 AM, Richard W.M. Jones wrote: > > This has already been pushed upstream. I am simply posting these here > > so we have a reference in the mailing list in case we find bugs later > > (as I'm sure we will - it's a complex patch series). > > > > Great thanks to Eric Blake for tireless review on this one. It also > > seems to have identified a few minor bugs in qemu along the way. > > > > Should nbdkit-rate-filter.pod mention that extents is considered > effectively free (same as zeroing and trimming)?I guess it's covered by the existing docs. Extents aren't quite as "free" as zeroing and trimming but the structured reply message is still quite small.> I know that I want the nbd plugin to do extents passthrough, but I have > some work to do to teach it do to NBD_OPT_GO/structured replies first. > > The full and null plugins should probably have a trivial .extents that > reports completely sparse, all of the time :)I think it would be wrong for full wouldn't it? It could mean that the client would skip the read, thus not getting the intended ENOSPC error. For null I agree.> Various plugins like partitioning and linuxdisk might be interesting to > support extents on (particularly since we know we pad in between files); > but it may be trickier to write. > > Can curl or ssh access extents information?Not sure about curl, but I checked ssh (sftp) and it cannot. However at some point I intend to submit an extension to the sftp protocol (similar to my "fsync@openssh.com" extension) to support it. We will need it for efficient ssh access in virt-v2v one day.> Language bindings will also be an interesting exercise ;)Yes, the language bindings lag behind what's available because they aren't generated ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Martin Kletzander
2019-Apr-01 11:22 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 08/19] error: Extend the error filter so it can inject errors into block status extents requests.
On Thu, Mar 28, 2019 at 04:18:35PM +0000, Richard W.M. Jones wrote:>--- > filters/error/nbdkit-error-filter.pod | 11 ++++++-- > filters/error/error.c | 36 ++++++++++++++++++++++++--- > 2 files changed, 41 insertions(+), 6 deletions(-) > >diff --git a/filters/error/nbdkit-error-filter.pod b/filters/error/nbdkit-error-filter.pod >index 504e1b0..16ef184 100644 >--- a/filters/error/nbdkit-error-filter.pod >+++ b/filters/error/nbdkit-error-filter.pod >@@ -12,6 +12,7 @@ nbdkit-error-filter - inject errors for testing clients > [error-pwrite=...] [error-pwrite-rate=...] [error-pwrite-file=...] > [error-trim=...] [error-trim-rate=...] [error-trim-file=...] > [error-zero=...] [error-zero-rate=...] [error-zero-file=...] >+ [error-extents=...] [error-extents-rate=...] [error-extents-file=...] > > =head1 DESCRIPTION > >@@ -29,8 +30,9 @@ Inject a low rate of errors randomly into the connection: > > nbdkit --filter=error file disk.img error-rate=1% > >-Reading and trimming requests will be successful, but all >-writes and zeroing will return "No space left on device": >+Reading, trimming and extents (block status) requests will be >+successful, but all writes and zeroing will return "No space left on >+device": > > nbdkit --filter=error file disk.img \ > error=ENOSPC \ >@@ -105,6 +107,11 @@ settings to NBD trim requests. > Same as C<error>, C<error-rate> and C<error-file> but only apply the > settings to NBD zero requests. > >+=item B<error-extents>, B<error-extents-rate>, B<error-extents-file>. >+ >+Same as C<error>, C<error-rate> and C<error-file> but only apply the >+settings to NBD block status requests to read extents. >+ > =back > > =head1 NOTES >diff --git a/filters/error/error.c b/filters/error/error.c >index 598bd1f..405bbe3 100644 >--- a/filters/error/error.c >+++ b/filters/error/error.c >@@ -63,6 +63,7 @@ static struct error_settings pread_settings = ERROR_DEFAULT; > static struct error_settings pwrite_settings = ERROR_DEFAULT; > static struct error_settings trim_settings = ERROR_DEFAULT; > static struct error_settings zero_settings = ERROR_DEFAULT; >+static struct error_settings extents_settings = ERROR_DEFAULT; > > /* Random state. > * This must only be accessed when holding the lock (except for load). >@@ -83,6 +84,7 @@ error_unload (void) > free (pwrite_settings.file); > free (trim_settings.file); > free (zero_settings.file); >+ free (extents_settings.file); > } > > static const struct { const char *name; int error; } errors[] = { >@@ -162,7 +164,8 @@ error_config (nbdkit_next_config *next, void *nxdata, > if (parse_error (key, value, &i) == -1) > return -1; > pread_settings.error = pwrite_settings.error >- trim_settings.error = zero_settings.error = i; >+ trim_settings.error = zero_settings.error >+ extents_settings.error = i; > return 0; > } > else if (strcmp (key, "error-pread") == 0) >@@ -173,12 +176,15 @@ error_config (nbdkit_next_config *next, void *nxdata, > return parse_error (key, value, &trim_settings.error); > else if (strcmp (key, "error-zero") == 0) > return parse_error (key, value, &zero_settings.error); >+ else if (strcmp (key, "error-extents") == 0) >+ return parse_error (key, value, &extents_settings.error); > > else if (strcmp (key, "error-rate") == 0) { > if (parse_error_rate (key, value, &d) == -1) > return -1; > pread_settings.rate = pwrite_settings.rate >- trim_settings.rate = zero_settings.rate = d; >+ trim_settings.rate = zero_settings.rate >+ extents_settings.rate = d; > return 0; > } > else if (strcmp (key, "error-pread-rate") == 0) >@@ -189,6 +195,8 @@ error_config (nbdkit_next_config *next, void *nxdata, > return parse_error_rate (key, value, &trim_settings.rate); > else if (strcmp (key, "error-zero-rate") == 0) > return parse_error_rate (key, value, &zero_settings.rate); >+ else if (strcmp (key, "error-extents-rate") == 0) >+ return parse_error_rate (key, value, &extents_settings.rate); > > /* NB: We are using nbdkit_absolute_path here because the trigger > * file probably doesn't exist yet. >@@ -202,6 +210,8 @@ error_config (nbdkit_next_config *next, void *nxdata, > trim_settings.file = nbdkit_absolute_path (value); > free (zero_settings.file); > zero_settings.file = nbdkit_absolute_path (value); >+ free (extents_settings.file); >+ extents_settings.file = nbdkit_absolute_path (value); > return 0; > } > else if (strcmp (key, "error-pread-rate") == 0) { >@@ -224,6 +234,11 @@ error_config (nbdkit_next_config *next, void *nxdata, > zero_settings.file = nbdkit_absolute_path (value); > return 0; > } >+ else if (strcmp (key, "error-extents-rate") == 0) {This should be "error-extents-file", right? And the same applies for all similar conditions above this (not seen in the context).>+ free (extents_settings.file); >+ extents_settings.file = nbdkit_absolute_path (value); >+ return 0; >+ } > > else > return next (nxdata, key, value); >@@ -234,8 +249,8 @@ error_config (nbdkit_next_config *next, void *nxdata, > " The error indication to return.\n" \ > "error-rate=0%..100%|0..1 Rate of errors to generate.\n" \ > "error-file=TRIGGER Set trigger filename.\n" \ >- "error-pread*, error-pwrite*, error-trim*, error-zero*\n" \ >- " Apply settings only to read/write/trim/zero" >+ "error-pread*, error-pwrite*, error-trim*, error-zero*, error-extents*\n" \ >+ " Apply settings only to read/write/etc" > > static void * > error_open (nbdkit_next_open *next, void *nxdata, int readonly) >@@ -330,6 +345,18 @@ error_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > return next_ops->zero (nxdata, count, offset, flags, err); > } > >+/* Extents. */ >+static int >+error_extents (struct nbdkit_next_ops *next_ops, void *nxdata, >+ void *handle, uint32_t count, uint64_t offset, >+ uint32_t flags, struct nbdkit_extents *extents, int *err) >+{ >+ if (random_error (&extents_settings, "extents", err)) >+ return -1; >+ >+ return next_ops->extents (nxdata, count, offset, flags, extents, err); >+} >+ > static struct nbdkit_filter filter = { > .name = "error", > .longname = "nbdkit error filter", >@@ -343,6 +370,7 @@ static struct nbdkit_filter filter = { > .pwrite = error_pwrite, > .trim = error_trim, > .zero = error_zero, >+ .extents = error_extents, > }; > > NBDKIT_REGISTER_FILTER(filter) >-- >2.20.1 >
Martin Kletzander
2019-Apr-01 11:33 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 09/19] log: Log extents requests.
On Thu, Mar 28, 2019 at 04:18:36PM +0000, Richard W.M. Jones wrote:>Typical output: > >2019-03-26 18:22:17.287434 connection=2 Extents id=2 offset=0x0 count=0x40000000 req_one=1 ... >2019-03-26 18:22:17.290040 connection=2 ...Extents id=2 extents=[{ offset=0x0, length=0x250000, type=0 }, { offset=0x250000, length=0x7db0000, type=3 }, { offset=0x8000000, length=0x8000, type=0 }, { offset=0x8008000, length=0x38000, type=3 }, { offset=0x8040000, length=0x208000, type=0 }, { offset=0x8248000, length=0x7db8000, type=3 }, { offset=0x10000000, length=0x208000, type=0 }, { offset=0x10208000, length=0x7df8000, type=3 }, { offset=0x18000000, length=0x8000, type=0 }, { offset=0x18008000, length=0x38000, type=3 }, { offset=0x18040000, length=0x208000, type=0 }, { offset=0x18248000, length=0x7db8000, type=3 }, { offset=0x20000000, length=0x208000, type=0 }, { offset=0x20208000, length=0x7df8000, type=3 }, { offset=0x28000000, length=0x8000, type=0 }, { offset=0x28008000, length=0x38000, type=3 }, { offset=0x28040000, length=0x208000, type=0 }, { offset=0x28248000, length=0x7db8000, type=3 }, { offset=0x30000000, length=0x208000, type=0 }, { offset=0x30208000, length=0x7df8000, type=3 }, { offset=0x38000000, length=0x8000, type=0 }, { offset=0x38008000, length=0x38000, type=3 }, { offset=0x38040000, length=0x208000, type=0 }, { offset=0x38248000, length=0x7da8000, type=3 }, { offset=0x3fff0000, length=0x10000, type=0 }] return=0 > >As you can see this is logging the complete information as generated >by the underlying plugin, not what is returned to the client (which is >rather hard for the log filter to discern). It's probably more useful >for debugging like this. >--- > filters/log/nbdkit-log-filter.pod | 6 ++--- > filters/log/log.c | 45 +++++++++++++++++++++++++++++++ > 2 files changed, 48 insertions(+), 3 deletions(-) > >diff --git a/filters/log/nbdkit-log-filter.pod b/filters/log/nbdkit-log-filter.pod >index cff209e..e391046 100644 >--- a/filters/log/nbdkit-log-filter.pod >+++ b/filters/log/nbdkit-log-filter.pod >@@ -46,9 +46,9 @@ the impact of the caching. > This filter writes to the file specified by the C<logfile=FILE> > parameter. All lines include a timestamp, a connection counter, then > details about the command. The following actions are logged: Connect, >-Read, Write, Zero, Trim, Flush, and Disconnect. Except for Connect >-and Disconnect, an event is logged across two lines for call and >-return value, to allow tracking duration and tracing any parallel >+Read, Write, Zero, Trim, Extents, Flush, and Disconnect. Except for >+Connect and Disconnect, an event is logged across two lines for call >+and return value, to allow tracking duration and tracing any parallel > execution, using id for correlation (incremented per action on the > connection). > >diff --git a/filters/log/log.c b/filters/log/log.c >index 9f2e37d..b31d784 100644 >--- a/filters/log/log.c >+++ b/filters/log/log.c >@@ -353,6 +353,50 @@ log_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > return r; > } > >+/* Extents. */ >+static int >+log_extents (struct nbdkit_next_ops *next_ops, void *nxdata, >+ void *handle, uint32_t count, uint64_t offs, uint32_t flags, >+ struct nbdkit_extents *extents, int *err) >+{ >+ struct handle *h = handle; >+ uint64_t id = get_id (h); >+ int r; >+ >+ assert (!(flags & ~(NBDKIT_FLAG_REQ_ONE))); >+ output (h, "Extents", id, >+ "offset=0x%" PRIx64 " count=0x%x req_one=%d ...", >+ offs, count, !!(flags & NBDKIT_FLAG_REQ_ONE)); >+ r = next_ops->extents (nxdata, count, offs, flags, extents, err); >+ if (r == -1) >+ output_return (h, "...Extents", id, r, err); >+ else { >+ FILE *fp; >+ char *extents_str = NULL; >+ size_t i, n, len = 0; >+ >+ fp = open_memstream (&extents_str, &len); >+ if (fp != NULL) { >+ n = nbdkit_extents_count (extents); >+ for (i = 0; i < n; ++i) { >+ struct nbdkit_extent e = nbdkit_get_extent (extents, i); >+ if (i > 0) >+ fprintf (fp, ", "); >+ fprintf (fp, "{ offset=0x%" PRIx64 ", length=0x%" PRIx64 ", " >+ "type=%" PRIu32 " }", >+ e.offset, e.length, e.type);I'm not sure what the consensus in nbdkit is, but it would be nicer to see what the type is, so that one does not have to go look in the header files.>+ } >+ >+ fclose (fp); >+ } >+ >+ output (h, "...Extents", id, "extents=[%s] return=0", >+ extents_str ? extents_str : "(null)"); >+ free (extents_str); >+ } >+ return r; >+} >+ > static struct nbdkit_filter filter = { > .name = "log", > .longname = "nbdkit log filter", >@@ -370,6 +414,7 @@ static struct nbdkit_filter filter = { > .flush = log_flush, > .trim = log_trim, > .zero = log_zero, >+ .extents = log_extents, > }; > > NBDKIT_REGISTER_FILTER(filter) >-- >2.20.1 >
Martin Kletzander
2019-Apr-01 11:42 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 11/19] partition: Implement mapping of extents.
On Thu, Mar 28, 2019 at 04:18:38PM +0000, Richard W.M. Jones wrote:>--- > filters/partition/partition.c | 37 +++++++++++++++++++++++++++++++++++ > 1 file changed, 37 insertions(+) > >diff --git a/filters/partition/partition.c b/filters/partition/partition.c >index c1fe7a1..bc60480 100644 >--- a/filters/partition/partition.c >+++ b/filters/partition/partition.c >@@ -38,6 +38,7 @@ > #include <stdint.h> > #include <string.h> > #include <inttypes.h> >+#include <assert.h> >Either: a) you forgot `assert (!(flags & ~(NBDKIT_FLAG_REQ_ONE)));` in this and the previous patch, b) this is a leftover from previous version, or c) I missed something.> #include <nbdkit-filter.h> > >@@ -222,6 +223,41 @@ partition_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > return next_ops->zero (nxdata, count, offs + h->offset, flags, err); > } > >+/* Extents. */ >+static int >+partition_extents (struct nbdkit_next_ops *next_ops, void *nxdata, >+ void *handle, uint32_t count, uint64_t offs, uint32_t flags, >+ struct nbdkit_extents *extents, int *err) >+{ >+ struct handle *h = handle; >+ size_t i; >+ struct nbdkit_extents *extents2; >+ struct nbdkit_extent e; >+ int64_t real_size = next_ops->get_size (nxdata); >+ >+ extents2 = nbdkit_extents_new (offs + h->offset, real_size - h->offset); >+ if (extents2 == NULL) { >+ *err = errno; >+ return -1; >+ } >+ if (next_ops->extents (nxdata, count, offs + h->offset, >+ flags, extents2, err) == -1) >+ goto error; >+ >+ for (i = 0; i < nbdkit_extents_count (extents2); ++i) { >+ e = nbdkit_get_extent (extents2, i); >+ e.offset -= h->offset; >+ if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) >+ goto error; >+ } >+ nbdkit_extents_free (extents2); >+ return 0; >+ >+ error: >+ nbdkit_extents_free (extents2); >+ return -1; >+} >+ > static struct nbdkit_filter filter = { > .name = "partition", > .longname = "nbdkit partition filter", >@@ -237,6 +273,7 @@ static struct nbdkit_filter filter = { > .pwrite = partition_pwrite, > .trim = partition_trim, > .zero = partition_zero, >+ .extents = partition_extents, > }; > > NBDKIT_REGISTER_FILTER(filter) >-- >2.20.1 >
Eric Blake
2019-Apr-23 17:36 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 10/19] offset: Implement mapping of extents.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> Allows you to safely use nbdkit-offset-filter on top of a plugin > supporting extents. > --- > filters/offset/offset.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) >> + > + extents2 = nbdkit_extents_new (offs + offset, real_size - offset); > + if (extents2 == NULL) { > + *err = errno; > + return -1; > + }Here, we are careful to set *err.> + if (next_ops->extents (nxdata, count, offs + offset, > + flags, extents2, err) == -1) > + goto error;And here.> + > + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { > + e = nbdkit_get_extent (extents2, i); > + e.offset -= offset; > + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) > + goto error;But here, *err remains unchanged. Is this a problem? Should nbdkit_add_extent() guarantee that errno is sane on failure (right now, it does not)? Affects several filters. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Apr-23 19:46 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 14/19] data, memory: Implement extents.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> These plugins are both based on the same sparse array structure which > supports a simple implementation of extents. > --- > common/sparse/sparse.h | 7 +- > common/sparse/sparse.c | 37 ++++++++++- > plugins/data/data.c | 16 ++++- > plugins/memory/memory.c | 16 ++++- > README | 2 + > tests/Makefile.am | 2 + > tests/test-data-extents.sh | 131 +++++++++++++++++++++++++++++++++++++ > 7 files changed, 207 insertions(+), 4 deletions(-) > > +int > +sparse_array_extents (struct sparse_array *sa, > + uint32_t count, uint64_t offset, > + struct nbdkit_extents *extents) > +{ > + uint32_t n, type; > + void *p; > + > + while (count > 0) { > + p = lookup (sa, offset, false, &n, NULL); > + if (n > count) > + n = count;I know in earlier review I asked whether we can move the clamping...> + > + /* Work out the type of this extent. */ > + if (p == NULL) > + /* No backing page, so it's a hole. */ > + type = NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO; > + else { > + if (is_zero (p, n)) > + /* A backing page and it's all zero, it's a zero extent. */ > + type = NBDKIT_EXTENT_ZERO; > + else > + /* Normal allocated data. */ > + type = 0; > + } > + if (nbdkit_add_extent (extents, offset, n, type) == -1) > + return -1; > +...to here, after the final nbdkit_add_extent, so that we can return a larger extent than the client's request. I remember when I originally asked, you declined due to odd interactions with REQ_ONE semantics, but since then, we changed how add_extent() works. Does it work now to defer the clamping? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Apr-24 19:39 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 15/19] file: Implement extents.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and > holes in the underlying file. > --- > plugins/file/file.c | 141 ++++++++++++++++++++++++++++++++++--- > tests/Makefile.am | 5 ++ > tests/test-file-extents.sh | 57 +++++++++++++++ > 3 files changed, 193 insertions(+), 10 deletions(-) >> +#ifdef SEEK_HOLE > +/* Extents. */ > + > +static int > +file_can_extents (void *handle) > +{ > + struct handle *h = handle; > + off_t r; > + > + /* A simple test to see whether SEEK_HOLE etc is likely to work on > + * the current filesystem. > + */ > + pthread_mutex_lock (&lseek_lock); > + r = lseek (h->fd, 0, SEEK_HOLE); > + pthread_mutex_unlock (&lseek_lock); > + if (r == -1) { > + nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m"); > + return 0; > + } > + return 1; > +}Should we also return 0 if the lseek() returned the offset of EOF? More concretely, what if we test: r = lseek(h->fd, lseek(h->fd, 0, SEEK_HOLE), SEEK_DATA); If r is non-negative, then we have a sparse file; if the inner SEEK_HOLE failed then the outer lseek(, -1, SEEK_DATA) should fail with EINVAL; while if SEEK_HOLE succeeded but reported the offset of EOF, then SEEK_DATA should fail due to ENXIO and we know the file has no holes at all. In favor of the double check: a completely non-sparse file is just wasting time on further lseek() calls that won't tell us anything different than our default of all data (and which may be helpful for things like tmpfs that have abysmal lseek() performance). Perhaps a reason against: if we expect NBD_CMD_TRIM/NBD_CMD_WRITE_ZEROES or even a third-party to be punching holes in the meantime, then opting out of future lseek() won't see those later additions of holes. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Apr-27 19:56 UTC
Re: [Libguestfs] [PATCH nbdkit v5 FINAL 09/19] log: Log extents requests.
On 3/28/19 11:18 AM, Richard W.M. Jones wrote:> Typical output: >> +++ b/filters/log/log.c > @@ -353,6 +353,50 @@ log_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > return r; > } > > +/* Extents. */ > +static int > +log_extents (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle, uint32_t count, uint64_t offs, uint32_t flags, > + struct nbdkit_extents *extents, int *err) > +{Missed a change to log_prepare() so that the log will include results of .can_extents. I've pushed the obvious fix. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [PATCH nbdkit 8/8] file: Implement extents.
- [PATCH nbdkit v2 1/3] file: Move file operators to a new common/fileops mini-library.
- Re: [PATCH nbdkit 8/8] file: Implement extents.
- [nbdkit PATCH] split: Add support for .extents
- [nbdkit PATCH 05/10] plugins: Wire up file-based plugin support for NBD_INFO_INIT_STATE