Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 00/15] Implement Block Status.
I'm not sure exactly which version we're up to, but let's say it's version 4. I'm a lot happier with this version: - all filters have been reviewed and changed where I think that's necessary - can_extents is properly defined and implemented now - NBD protocol is followed - I believe it addresses all previous review points where possible The "only" thing missing is it desperately needs proper tests, which is the next thing on my to-do list. So that means the patch series still isn't complete and ready for upstream, but it's certainly ready for review. Rich.
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 01/15] server: Implement extents/can_extents calls for plugins and filters.
This pair of calls allows plugins to describe which extents in the
virtual disk are allocated, holes or zeroes.
---
 docs/nbdkit-filter.pod  |  89 +++++++++++++++++++
 docs/nbdkit-plugin.pod  |  96 +++++++++++++++++++++
 include/nbdkit-common.h |  10 ++-
 include/nbdkit-filter.h |  22 ++++-
 include/nbdkit-plugin.h |   6 +-
 server/internal.h       |   4 +
 server/extents.c        | 186 ++++++++++++++++++++++++++++++++++++++++
 server/filters.c        |  59 +++++++++++++
 server/plugins.c        |  54 +++++++++++-
 server/Makefile.am      |   1 +
 server/nbdkit.syms      |   5 ++
 11 files changed, 528 insertions(+), 4 deletions(-)
diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod
index dc9a262..6b22f21 100644
--- a/docs/nbdkit-filter.pod
+++ b/docs/nbdkit-filter.pod
@@ -350,6 +350,8 @@ calls.
 
 =head2 C<.can_zero>
 
+=head2 C<.can_extents>
+
 =head2 C<.can_fua>
 
 =head2 C<.can_multi_conn>
@@ -365,6 +367,8 @@ calls.
                   void *handle);
  int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata,
                   void *handle);
+ int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata,
+                     void *handle);
  int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata,
                  void *handle);
  int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata,
@@ -513,6 +517,91 @@ value to return to the client.  The filter should never
fail with
 C<EOPNOTSUPP> (while plugins have automatic fallback to C<.pwrite>,
 filters do not).
 
+=head2 C<.extents>
+
+ int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata,
+                 void *handle, uint32_t count, uint64_t offset, uint32_t flags,
+                 struct nbdkit_extents *extents,
+                 int *err);
+
+This intercepts the plugin C<.extents> method and can be used to
+modify extent requests.
+
+This function will not be called if C<.can_extents> returned false; in
+turn, the filter should not call C<next_ops-E<gt>extents> if
+C<next_ops-E<gt>can_extents> did not return true.
+
+The C<extents> parameter passed to this function is empty, and
+should usually be passed straight to the underlying plugin:
+
+ myfilter_extents (...)
+ {
+   return next_ops->extents (nxdata, count, offset, flags,
+                             extents, err);
+ }
+
+It is also possible for filters to transform the extents list received
+back from the layer below.  Without error checking it would look like
+this:
+
+ myfilter_extents (..., uint32_t count, uint64_t offset, ...)
+ {
+   size_t i;
+   struct nbdkit_extents *extents2;
+   struct nbdkit_extent e;
+
+   extents2 = nbdkit_extents_new (offset + shift);
+   next_ops->extents (nxdata, count, offset + shift, flags, extents2, err);
+   for (i = 0; i < nbdkit_extents_count (extents2); ++i) {
+     e = nbdkit_get_extent (extents2, i);
+     e.offset -= shift;
+     nbdkit_add_extent (extents, e.offset, e.length, e.type);
+   }
+   nbdkit_extents_free (extents2);
+ }
+
+If there is an error, C<.extents> should call C<nbdkit_error> with
an
+error message B<and> return -1 with C<err> set to the positive
errno
+value to return to the client.
+
+=head3 Allocating and freeing nbdkit_extents list
+
+Two functions are provided to filters only for allocating and freeing
+the map:
+
+ struct nbdkit_extents *nbdkit_extents_new (uint64_t start);
+
+Allocates and returns a new, empty extents list.  The C<start>
+parameter is the start of the range described in the list.  Normally
+you would pass in C<offset> as this parameter, but for filters which
+adjust offsets, they should pass in the adjusted offset.
+
+On error this function can return C<NULL>.  In this case it calls
+C<nbdkit_error> and/or C<nbdkit_set_error> as required.
+
+ void nbdkit_extents_free (struct nbdkit_extents *);
+
+Frees an existing extents list.
+
+=head3 Iterating over nbdkit_extents list
+
+Two functions are provided to filters only to iterate over the extents
+in order:
+
+ size_t nbdkit_extents_count (const struct nbdkit_extents *);
+
+Returns the number of extents in the list.
+
+ struct nbdkit_extent {
+   uint64_t offset;
+   uint64_t length;
+   uint32_t type;
+ };
+ struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *,
+                                         size_t i);
+
+Returns a copy of the C<i>'th extent.
+
 =head1 ERROR HANDLING
 
 If there is an error in the filter itself, the filter should call
diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
index 47545f3..332c131 100644
--- a/docs/nbdkit-plugin.pod
+++ b/docs/nbdkit-plugin.pod
@@ -555,6 +555,20 @@ This callback is not required.  If omitted, then nbdkit
always tries
 C<.zero> first if it is present, and gracefully falls back to
 C<.pwrite> if C<.zero> was absent or failed with
C<EOPNOTSUPP>.
 
+=head2 C<.can_extents>
+
+ int can_extents (void *handle);
+
+This is called during the option negotiation phase to find out if the
+plugin supports detecting allocated (non-sparse) regions of the disk
+with the C<.extents> callback.
+
+If there is an error, C<.can_extents> should call C<nbdkit_error>
with
+an error message and return C<-1>.
+
+This callback is not required.  If omitted, then we return true iff a
+C<.extents> callback has been defined.
+
 =head2 C<.can_fua>
 
  int can_fua (void *handle);
@@ -717,6 +731,88 @@ If there is an error, C<.zero> should call
C<nbdkit_error> with an
 error message, and C<nbdkit_set_error> to record an appropriate error
 (unless C<errno> is sufficient), then return C<-1>.
 
+=head2 C<.extents>
+
+ int extents (void *handle, uint32_t count, uint64_t offset,
+              uint32_t flags, struct nbdkit_extents *extents);
+
+During the data serving phase, this callback is used to detect
+allocated, sparse and zeroed regions of the disk.
+
+This function will not be called if C<.can_extents> returned false.
+nbdkit's default behaviour in this case is to treat the whole virtual
+disk as if it was allocated.
+
+The callback should detect and return the list of extents overlapping
+the range C<[offset...offset+count-1]>.  The C<extents> parameter
+points to an opaque object which the callback should fill in by
+calling C<nbdkit_add_extent>.  See L</Extents list> below.
+
+If there is an error, C<.extents> should call C<nbdkit_error> with
an
+error message, and C<nbdkit_set_error> to record an appropriate error
+(unless C<errno> is sufficient), then return C<-1>.
+
+=head3 Extents list
+
+The plugin C<extents> callback is passed an opaque pointer C<struct
+nbdkit_extents *extents>.  This structure represents a list of
+L<filesystem extents|https://en.wikipedia.org/wiki/Extent_(file_systems)>
+describing which areas of the disk are allocated, which are sparse
+(“holes”), and, if supported, which are zeroes.
+
+The C<extents> callback should scan the disk starting at C<offset>
and
+call C<nbdkit_add_extent> for each extent found.
+
+Extents overlapping the range C<[offset...offset+count-1]> should be
+returned if possible.  However nbdkit ignores extents E<lt> offset so
+the plugin may, if it is easier to implement, return all extent
+information for the whole disk.  The plugin may return extents beyond
+the end of the range.  It may also return extent information for less
+than the whole range, but it must return at least one extent
+overlapping C<offset>.
+
+The extents B<must> be added in ascending order, and B<must> be
+contiguous.
+
+The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE>
+which means that the client is only requesting information about the
+extent overlapping C<offset>.  The plugin may ignore this flag, or as
+an optimization it may return just a single extent for C<offset>.
+
+ int nbdkit_add_extent (struct nbdkit_extents *extents,
+                        uint64_t offset, uint64_t length, uint32_t type);
+
+Add an extent covering C<[offset...offset+length-1]> of one of
+the following four types:
+
+=over 4
+
+=item C<type = 0>
+
+A normal, allocated data extent.
+
+=item C<type = NBDKIT_EXTENT_HOLE|NBDKIT_EXTENT_ZERO>
+
+An unallocated extent, a.k.a. a “hole”, which reads back as zeroes.
+This is the normal type of hole applicable to most disks.
+
+=item C<type = NBDKIT_EXTENT_ZERO>
+
+An allocated extent which is known to contain only zeroes.
+
+=item C<type = NBDKIT_EXTENT_HOLE>
+
+An unallocated extent (hole) which does not read back as zeroes.  Note
+this should only be used in specialized circumstances such as when
+writing a plugin for (or to emulate) certain SCSI drives which do not
+guarantee that trimmed blocks read back as zeroes.
+
+=back
+
+C<nbdkit_extent_add> returns C<0> on success or C<-1> on
failure.  On
+failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been
+called.
+
 =head1 THREADS
 
 Each nbdkit plugin must declare its thread safety model by defining
diff --git a/include/nbdkit-common.h b/include/nbdkit-common.h
index cb9954e..7c69b14 100644
--- a/include/nbdkit-common.h
+++ b/include/nbdkit-common.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -60,11 +60,15 @@ extern "C" {
 
 #define NBDKIT_FLAG_MAY_TRIM (1<<0) /* Maps to !NBD_CMD_FLAG_NO_HOLE */
 #define NBDKIT_FLAG_FUA      (1<<1) /* Maps to NBD_CMD_FLAG_FUA */
+#define NBDKIT_FLAG_REQ_ONE  (1<<2) /* Maps to NBD_CMD_FLAG_REQ_ONE */
 
 #define NBDKIT_FUA_NONE       0
 #define NBDKIT_FUA_EMULATE    1
 #define NBDKIT_FUA_NATIVE     2
 
+#define NBDKIT_EXTENT_HOLE    (1<<0) /* Same as NBD_STATE_HOLE */
+#define NBDKIT_EXTENT_ZERO    (1<<1) /* Same as NBD_STATE_ZERO */
+
 extern void nbdkit_error (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2);
 extern void nbdkit_verror (const char *msg, va_list args);
 extern void nbdkit_debug (const char *msg, ...) ATTRIBUTE_FORMAT_PRINTF (1, 2);
@@ -76,6 +80,10 @@ extern int nbdkit_parse_bool (const char *str);
 extern int nbdkit_read_password (const char *value, char **password);
 extern char *nbdkit_realpath (const char *path);
 
+struct nbdkit_extents;
+extern int nbdkit_add_extent (struct nbdkit_extents *,
+                              uint64_t offset, uint64_t length, uint32_t type);
+
 /* A static non-NULL pointer which can be used when you don't need a
  * per-connection handle.
  */
diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h
index 71c06c8..8289bb0 100644
--- a/include/nbdkit-filter.h
+++ b/include/nbdkit-filter.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -44,6 +44,18 @@ extern "C" {
 
 #define NBDKIT_FILTER_API_VERSION 2
 
+struct nbdkit_extent {
+  uint64_t offset;
+  uint64_t length;
+  uint32_t type;
+};
+
+extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start);
+extern void nbdkit_extents_free (struct nbdkit_extents *);
+extern size_t nbdkit_extents_count (const struct nbdkit_extents *);
+extern struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *,
+                                               size_t);
+
 typedef int nbdkit_next_config (void *nxdata,
                                 const char *key, const char *value);
 typedef int nbdkit_next_config_complete (void *nxdata);
@@ -58,6 +70,7 @@ struct nbdkit_next_ops {
   int (*is_rotational) (void *nxdata);
   int (*can_trim) (void *nxdata);
   int (*can_zero) (void *nxdata);
+  int (*can_extents) (void *nxdata);
   int (*can_fua) (void *nxdata);
   int (*can_multi_conn) (void *nxdata);
 
@@ -71,6 +84,8 @@ struct nbdkit_next_ops {
                int *err);
   int (*zero) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags,
                int *err);
+  int (*extents) (void *nxdata, uint32_t count, uint64_t offset, uint32_t
flags,
+                  struct nbdkit_extents *extents, int *err);
 };
 
 struct nbdkit_filter {
@@ -120,6 +135,8 @@ struct nbdkit_filter {
                    void *handle);
   int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata,
                    void *handle);
+  int (*can_extents) (struct nbdkit_next_ops *next_ops, void *nxdata,
+                      void *handle);
   int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata,
                   void *handle);
   int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata,
@@ -140,6 +157,9 @@ struct nbdkit_filter {
   int (*zero) (struct nbdkit_next_ops *next_ops, void *nxdata,
                void *handle, uint32_t count, uint64_t offset, uint32_t flags,
                int *err);
+  int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata,
+                  void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
+                  struct nbdkit_extents *extents, int *err);
 };
 
 #define NBDKIT_REGISTER_FILTER(filter)                                  \
diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h
index d43b2f5..20d193c 100644
--- a/include/nbdkit-plugin.h
+++ b/include/nbdkit-plugin.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -125,6 +125,10 @@ struct nbdkit_plugin {
   const char *magic_config_key;
 
   int (*can_multi_conn) (void *handle);
+
+  int (*can_extents) (void *handle);
+  int (*extents) (void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
+                  struct nbdkit_extents *extents);
 };
 
 extern void nbdkit_set_error (int err);
diff --git a/server/internal.h b/server/internal.h
index d40a82d..71d2815 100644
--- a/server/internal.h
+++ b/server/internal.h
@@ -274,6 +274,7 @@ struct backend {
   int (*is_rotational) (struct backend *, struct connection *conn);
   int (*can_trim) (struct backend *, struct connection *conn);
   int (*can_zero) (struct backend *, struct connection *conn);
+  int (*can_extents) (struct backend *, struct connection *conn);
   int (*can_fua) (struct backend *, struct connection *conn);
   int (*can_multi_conn) (struct backend *, struct connection *conn);
 
@@ -287,6 +288,9 @@ struct backend {
                uint64_t offset, uint32_t flags, int *err);
   int (*zero) (struct backend *, struct connection *conn, uint32_t count,
                uint64_t offset, uint32_t flags, int *err);
+  int (*extents) (struct backend *, struct connection *conn, uint32_t count,
+                  uint64_t offset, uint32_t flags,
+                  struct nbdkit_extents *extents, int *err);
 };
 
 /* plugins.c */
diff --git a/server/extents.c b/server/extents.c
new file mode 100644
index 0000000..b5cf444
--- /dev/null
+++ b/server/extents.c
@@ -0,0 +1,186 @@
+/* nbdkit
+ * Copyright (C) 2019 Red Hat Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <assert.h>
+
+#include "minmax.h"
+
+#include "internal.h"
+
+struct nbdkit_extents {
+  struct nbdkit_extent *extents;
+  size_t nr_extents, allocated;
+  uint64_t start;
+};
+
+struct nbdkit_extents *
+nbdkit_extents_new (uint64_t start)
+{
+  struct nbdkit_extents *r;
+
+  if (start >= INT64_MAX) {
+    nbdkit_error ("nbdkit_extents_new: start (%" PRIu64 ") >
INT64_MAX",
+                  start);
+    errno = ERANGE;
+    return NULL;
+  }
+
+  r = malloc (sizeof *r);
+  if (r == NULL) {
+    nbdkit_error ("nbdkit_extents_new: malloc: %m");
+    return NULL;
+  }
+  r->extents = NULL;
+  r->nr_extents = r->allocated = 0;
+  r->start = start;
+  return r;
+}
+
+void
+nbdkit_extents_free (struct nbdkit_extents *exts)
+{
+  if (exts) {
+    free (exts->extents);
+    free (exts);
+  }
+}
+
+size_t
+nbdkit_extents_count (const struct nbdkit_extents *exts)
+{
+  return exts->nr_extents;
+}
+
+const struct nbdkit_extent
+nbdkit_get_extent (const struct nbdkit_extents *exts, size_t i)
+{
+  assert (i < exts->nr_extents);
+  return exts->extents[i];
+}
+
+/* Insert *e in the list at the end. */
+static int
+append_extent (struct nbdkit_extents *exts, const struct nbdkit_extent *e)
+{
+  if (exts->nr_extents >= exts->allocated) {
+    size_t new_allocated;
+    struct nbdkit_extent *new_extents;
+
+    new_allocated = exts->allocated;
+    if (new_allocated == 0)
+      new_allocated = 1;
+    new_allocated *= 2;
+    new_extents +      realloc (exts->extents, new_allocated * sizeof
(struct nbdkit_extent));
+    if (new_extents == NULL) {
+      nbdkit_error ("nbdkit_add_extent: realloc: %m");
+      return -1;
+    }
+    exts->allocated = new_allocated;
+    exts->extents = new_extents;
+  }
+
+  exts->extents[exts->nr_extents] = *e;
+  exts->nr_extents++;
+  return 0;
+}
+
+int
+nbdkit_add_extent (struct nbdkit_extents *exts,
+                   uint64_t offset, uint64_t length, uint32_t type)
+{
+  uint64_t overlap;
+
+  if (length == 0)
+    return 0;
+
+  /* If there are existing extents, the new extent must be contiguous. */
+  if (exts->nr_extents > 0) {
+    const struct nbdkit_extent *ep;
+
+    ep = &exts->extents[exts->nr_extents-1];
+    if (offset != ep->offset + ep->length) {
+      nbdkit_error ("nbdkit_add_extent: "
+                    "extents must be added in ascending order and "
+                    "must be contiguous");
+      return -1;
+    }
+  }
+  else {
+    /* If there are no existing extents, and the new extent is
+     * entirely before start, ignore it.
+     */
+    if (offset + length <= exts->start)
+      return 0;
+
+    /* If there are no existing extents, and the new extent is after
+     * start, then this is a bug in the plugin.
+     */
+    if (offset > exts->start) {
+      nbdkit_error ("nbdkit_add_extent: "
+                    "first extent must not be > start (%" PRIu64
")",
+                    exts->start);
+      return -1;
+    }
+
+    /* If there are no existing extents, and the new extent overlaps
+     * start, truncate it so it starts at start.
+     */
+    overlap = exts->start - offset;
+    length -= overlap;
+    offset += overlap;
+  }
+
+  /* If we get here we are going to either add or extend. */
+  if (exts->nr_extents > 0 &&
+      exts->extents[exts->nr_extents-1].type == type) {
+    /* Coalesce with the last extent. */
+    exts->extents[exts->nr_extents-1].length += length;
+    return 0;
+  }
+  else {
+    /* Add a new extent. */
+    const struct nbdkit_extent e +      { .offset = offset, .length = length,
.type = type };
+    return append_extent (exts, &e);
+  }
+}
diff --git a/server/filters.c b/server/filters.c
index 5b7abc4..5095188 100644
--- a/server/filters.c
+++ b/server/filters.c
@@ -309,6 +309,13 @@ next_can_zero (void *nxdata)
   return b_conn->b->can_zero (b_conn->b, b_conn->conn);
 }
 
+static int
+next_can_extents (void *nxdata)
+{
+  struct b_conn *b_conn = nxdata;
+  return b_conn->b->can_extents (b_conn->b, b_conn->conn);
+}
+
 static int
 next_can_fua (void *nxdata)
 {
@@ -364,6 +371,15 @@ next_zero (void *nxdata, uint32_t count, uint64_t offset,
uint32_t flags,
   return b_conn->b->zero (b_conn->b, b_conn->conn, count, offset,
flags, err);
 }
 
+static int
+next_extents (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags,
+              struct nbdkit_extents *extents, int *err)
+{
+  struct b_conn *b_conn = nxdata;
+  return b_conn->b->extents (b_conn->b, b_conn->conn, count,
offset, flags,
+                             extents, err);
+}
+
 static struct nbdkit_next_ops next_ops = {
   .get_size = next_get_size,
   .can_write = next_can_write,
@@ -371,6 +387,7 @@ static struct nbdkit_next_ops next_ops = {
   .is_rotational = next_is_rotational,
   .can_trim = next_can_trim,
   .can_zero = next_can_zero,
+  .can_extents = next_can_extents,
   .can_fua = next_can_fua,
   .can_multi_conn = next_can_multi_conn,
   .pread = next_pread,
@@ -378,6 +395,7 @@ static struct nbdkit_next_ops next_ops = {
   .flush = next_flush,
   .trim = next_trim,
   .zero = next_zero,
+  .extents = next_extents,
 };
 
 static int
@@ -511,6 +529,21 @@ filter_can_zero (struct backend *b, struct connection
*conn)
     return f->backend.next->can_zero (f->backend.next, conn);
 }
 
+static int
+filter_can_extents (struct backend *b, struct connection *conn)
+{
+  struct backend_filter *f = container_of (b, struct backend_filter, backend);
+  void *handle = connection_get_handle (conn, f->backend.i);
+  struct b_conn nxdata = { .b = f->backend.next, .conn = conn };
+
+  debug ("%s: can_extents", f->name);
+
+  if (f->filter.can_extents)
+    return f->filter.can_extents (&next_ops, &nxdata, handle);
+  else
+    return f->backend.next->can_extents (f->backend.next, conn);
+}
+
 static int
 filter_can_fua (struct backend *b, struct connection *conn)
 {
@@ -646,6 +679,30 @@ filter_zero (struct backend *b, struct connection *conn,
                                   count, offset, flags, err);
 }
 
+static int
+filter_extents (struct backend *b, struct connection *conn,
+                uint32_t count, uint64_t offset, uint32_t flags,
+                struct nbdkit_extents *extents, int *err)
+{
+  struct backend_filter *f = container_of (b, struct backend_filter, backend);
+  void *handle = connection_get_handle (conn, f->backend.i);
+  struct b_conn nxdata = { .b = f->backend.next, .conn = conn };
+
+  assert (!(flags & ~NBDKIT_FLAG_REQ_ONE));
+
+  debug ("%s: extents count=%" PRIu32 " offset=%" PRIu64
" flags=0x%" PRIx32,
+         f->name, count, offset, flags);
+
+  if (f->filter.extents)
+    return f->filter.extents (&next_ops, &nxdata, handle,
+                              count, offset, flags,
+                              extents, err);
+  else
+    return f->backend.next->extents (f->backend.next, conn,
+                                     count, offset, flags,
+                                     extents, err);
+}
+
 static struct backend filter_functions = {
   .free = filter_free,
   .thread_model = filter_thread_model,
@@ -667,6 +724,7 @@ static struct backend filter_functions = {
   .is_rotational = filter_is_rotational,
   .can_trim = filter_can_trim,
   .can_zero = filter_can_zero,
+  .can_extents = filter_can_extents,
   .can_fua = filter_can_fua,
   .can_multi_conn = filter_can_multi_conn,
   .pread = filter_pread,
@@ -674,6 +732,7 @@ static struct backend filter_functions = {
   .flush = filter_flush,
   .trim = filter_trim,
   .zero = filter_zero,
+  .extents = filter_extents,
 };
 
 /* Register and load a filter. */
diff --git a/server/plugins.c b/server/plugins.c
index 28b96ad..0b0fe77 100644
--- a/server/plugins.c
+++ b/server/plugins.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -199,6 +199,8 @@ plugin_dump_fields (struct backend *b)
   HAS (trim);
   HAS (zero);
   HAS (can_multi_conn);
+  HAS (can_extents);
+  HAS (extents);
 #undef HAS
 
   /* Custom fields. */
@@ -392,6 +394,21 @@ plugin_can_zero (struct backend *b, struct connection
*conn)
   return plugin_can_write (b, conn);
 }
 
+static int
+plugin_can_extents (struct backend *b, struct connection *conn)
+{
+  struct backend_plugin *p = container_of (b, struct backend_plugin, backend);
+
+  assert (connection_get_handle (conn, 0));
+
+  debug ("can_extents");
+
+  if (p->plugin.can_extents)
+    return p->plugin.can_extents (connection_get_handle (conn, 0));
+  else
+    return p->plugin.extents != NULL;
+}
+
 static int
 plugin_can_fua (struct backend *b, struct connection *conn)
 {
@@ -651,6 +668,39 @@ plugin_zero (struct backend *b, struct connection *conn,
   return r;
 }
 
+static int
+plugin_extents (struct backend *b, struct connection *conn,
+                uint32_t count, uint64_t offset, uint32_t flags,
+                struct nbdkit_extents *extents, int *err)
+{
+  struct backend_plugin *p = container_of (b, struct backend_plugin, backend);
+  bool req_one = flags & NBDKIT_FLAG_REQ_ONE;
+  int r;
+
+  assert (connection_get_handle (conn, 0));
+  assert (!(flags & ~NBDKIT_FLAG_REQ_ONE));
+
+  /* This should be true because plugin_can_extents checks it. */
+  assert (p->plugin.extents);
+
+  debug ("extents count=%" PRIu32 " offset=%" PRIu64 "
req_one=%d",
+         count, offset, req_one);
+
+  if (!count)
+    return 0;
+
+  r = p->plugin.extents (connection_get_handle (conn, 0), count, offset,
+                         flags, extents);
+  if (r >= 0 && nbdkit_extents_count (extents) < 1) {
+    nbdkit_error ("extents: plugin must return at least one extent");
+    nbdkit_set_error (EINVAL);
+    r = -1;
+  }
+  if (r == -1)
+    *err = get_error (p);
+  return r;
+}
+
 static struct backend plugin_functions = {
   .free = plugin_free,
   .thread_model = plugin_thread_model,
@@ -672,6 +722,7 @@ static struct backend plugin_functions = {
   .is_rotational = plugin_is_rotational,
   .can_trim = plugin_can_trim,
   .can_zero = plugin_can_zero,
+  .can_extents = plugin_can_extents,
   .can_fua = plugin_can_fua,
   .can_multi_conn = plugin_can_multi_conn,
   .pread = plugin_pread,
@@ -679,6 +730,7 @@ static struct backend plugin_functions = {
   .flush = plugin_flush,
   .trim = plugin_trim,
   .zero = plugin_zero,
+  .extents = plugin_extents,
 };
 
 /* Register and load a plugin. */
diff --git a/server/Makefile.am b/server/Makefile.am
index be70700..5ee662e 100644
--- a/server/Makefile.am
+++ b/server/Makefile.am
@@ -43,6 +43,7 @@ nbdkit_SOURCES = \
 	connections.c \
 	crypto.c \
 	debug.c \
+	extents.c \
 	filters.c \
 	internal.h \
 	locks.c \
diff --git a/server/nbdkit.syms b/server/nbdkit.syms
index 672abd2..240953e 100644
--- a/server/nbdkit.syms
+++ b/server/nbdkit.syms
@@ -40,8 +40,13 @@
   # The functions we want plugins and filters to call.
   global:
     nbdkit_absolute_path;
+    nbdkit_add_extent;
     nbdkit_debug;
     nbdkit_error;
+    nbdkit_extents_count;
+    nbdkit_extents_free;
+    nbdkit_extents_new;
+    nbdkit_get_extent;
     nbdkit_parse_bool;
     nbdkit_parse_size;
     nbdkit_read_password;
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 02/15] server: Add CLEANUP_EXTENTS_FREE macro.
Provides automatic cleanup of ‘struct nbdkit_extents_map *’ on exit
from a scope or function.
---
 server/internal.h  | 2 ++
 server/cleanup.c   | 8 +++++++-
 server/Makefile.am | 3 ++-
 3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/server/internal.h b/server/internal.h
index 71d2815..ae51804 100644
--- a/server/internal.h
+++ b/server/internal.h
@@ -139,6 +139,8 @@ extern void change_user (void);
 /* cleanup.c */
 extern void cleanup_free (void *ptr);
 #define CLEANUP_FREE __attribute__((cleanup (cleanup_free)))
+extern void cleanup_extents_free (void *ptr);
+#define CLEANUP_EXTENTS_FREE __attribute__((cleanup (cleanup_extents_free)))
 extern void cleanup_unlock (pthread_mutex_t **ptr);
 #define CLEANUP_UNLOCK __attribute__((cleanup (cleanup_unlock)))
 #define ACQUIRE_LOCK_FOR_CURRENT_SCOPE(mutex) \
diff --git a/server/cleanup.c b/server/cleanup.c
index c29ecec..1eec613 100644
--- a/server/cleanup.c
+++ b/server/cleanup.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -46,6 +46,12 @@ cleanup_free (void *ptr)
   free (* (void **) ptr);
 }
 
+void
+cleanup_extents_free (void *ptr)
+{
+  nbdkit_extents_free (* (void **) ptr);
+}
+
 void
 cleanup_unlock (pthread_mutex_t **ptr)
 {
diff --git a/server/Makefile.am b/server/Makefile.am
index 5ee662e..d50ebe4 100644
--- a/server/Makefile.am
+++ b/server/Makefile.am
@@ -125,7 +125,8 @@ check_PROGRAMS = test-utils
 test_utils_SOURCES = \
 	test-utils.c \
 	utils.c \
-	cleanup.c
+	cleanup.c \
+	extents.c
 test_utils_CPPFLAGS = \
 	-I$(top_srcdir)/include \
 	-I$(top_srcdir)/common/include
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 03/15] server: Implement Block Status requests to read allocation status.
This commit implements the core NBD protocol for the "base:allocation"
Block Status replies.
---
 common/protocol/protocol.h           |  15 ++
 server/internal.h                    |   8 +
 server/protocol-handshake-newstyle.c |  63 ++++++-
 server/protocol-handshake.c          |  11 ++
 server/protocol.c                    | 240 +++++++++++++++++++++++++--
 TODO                                 |   4 +-
 6 files changed, 324 insertions(+), 17 deletions(-)
diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h
index 06b917e..a7de2f0 100644
--- a/common/protocol/protocol.h
+++ b/common/protocol/protocol.h
@@ -112,6 +112,7 @@ extern const char *name_of_nbd_rep (int);
 #define NBD_REP_ACK          1
 #define NBD_REP_SERVER       2
 #define NBD_REP_INFO         3
+#define NBD_REP_META_CONTEXT 4
 #define NBD_REP_ERR_UNSUP    0x80000001
 #define NBD_REP_ERR_POLICY   0x80000002
 #define NBD_REP_ERR_INVALID  0x80000003
@@ -128,6 +129,18 @@ struct fixed_new_option_reply_info_export {
   uint16_t eflags;              /* per-export flags */
 } __attribute__((packed));
 
+/* NBD_REP_META_CONTEXT reply (follows fixed_new_option_reply). */
+struct fixed_new_option_reply_meta_context {
+  uint32_t context_id;          /* metadata context ID */
+  /* followed by a string */
+} __attribute__((packed));
+
+/* NBD_REPLY_TYPE_BLOCK_STATUS block descriptor. */
+struct block_descriptor {
+  uint32_t length;              /* length of block */
+  uint32_t status_flags;        /* block type (hole etc) */
+} __attribute__((packed));
+
 /* New-style handshake server reply when using NBD_OPT_EXPORT_NAME.
  * Modern clients use NBD_OPT_GO instead of this.
  */
@@ -199,10 +212,12 @@ extern const char *name_of_nbd_cmd (int);
 #define NBD_CMD_FLUSH             3
 #define NBD_CMD_TRIM              4
 #define NBD_CMD_WRITE_ZEROES      6
+#define NBD_CMD_BLOCK_STATUS      7
 
 extern const char *name_of_nbd_cmd_flag (int);
 #define NBD_CMD_FLAG_FUA      (1<<0)
 #define NBD_CMD_FLAG_NO_HOLE  (1<<1)
+#define NBD_CMD_FLAG_REQ_ONE  (1<<3)
 
 /* Error codes (previously errno).
  * See
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ca4414804114fd0095b317785bc0b51862e62ebb
diff --git a/server/internal.h b/server/internal.h
index ae51804..d804441 100644
--- a/server/internal.h
+++ b/server/internal.h
@@ -181,8 +181,10 @@ struct connection {
   bool can_zero;
   bool can_fua;
   bool can_multi_conn;
+  bool can_extents;
   bool using_tls;
   bool structured_replies;
+  bool meta_context_base_allocation;
 
   int sockin, sockout;
   connection_recv_function recv;
@@ -219,6 +221,12 @@ extern int protocol_handshake_newstyle (struct connection
*conn)
 extern int protocol_recv_request_send_reply (struct connection *conn)
   __attribute__((__nonnull__ (1)));
 
+/* The context ID of base:allocation.  As far as I can tell it doesn't
+ * matter what this is as long as nbdkit always returns the same
+ * number.
+ */
+#define base_allocation_id 1
+
 /* crypto.c */
 #define root_tls_certificates_dir sysconfdir "/pki/" PACKAGE_NAME
 extern void crypto_init (bool tls_set_on_cli);
diff --git a/server/protocol-handshake-newstyle.c
b/server/protocol-handshake-newstyle.c
index db01f7b..6899e6c 100644
--- a/server/protocol-handshake-newstyle.c
+++ b/server/protocol-handshake-newstyle.c
@@ -133,6 +133,34 @@ send_newstyle_option_reply_info_export (struct connection
*conn,
   return 0;
 }
 
+static int
+send_newstyle_option_reply_meta_context (struct connection *conn,
+                                         uint32_t option, uint32_t reply,
+                                         uint32_t context_id,
+                                         const char *name)
+{
+  struct fixed_new_option_reply fixed_new_option_reply;
+  struct fixed_new_option_reply_meta_context context;
+  const size_t namelen = strlen (name);
+
+  fixed_new_option_reply.magic = htobe64 (NBD_REP_MAGIC);
+  fixed_new_option_reply.option = htobe32 (option);
+  fixed_new_option_reply.reply = htobe32 (reply);
+  fixed_new_option_reply.replylen = htobe32 (sizeof context + namelen);
+  context.context_id = htobe32 (context_id);
+
+  if (conn->send (conn,
+                  &fixed_new_option_reply,
+                  sizeof fixed_new_option_reply) == -1 ||
+      conn->send (conn, &context, sizeof context) == -1 ||
+      conn->send (conn, name, namelen) == -1) {
+    nbdkit_error ("write: %m");
+    return -1;
+  }
+
+  return 0;
+}
+
 /* Sub-function during negotiate_handshake_newstyle, to uniformly handle
  * a client hanging up on a message boundary.
  */
@@ -503,7 +531,15 @@ negotiate_handshake_newstyle_options (struct connection
*conn)
          * for SET: nr_queries == 0 means reset all contexts
          */
         if (nr_queries == 0) {
-          /* Nothing is supported now. */
+          if (option == NBD_OPT_SET_META_CONTEXT)
+            conn->meta_context_base_allocation = false;
+          else /* LIST */ {
+            if (send_newstyle_option_reply_meta_context
+                (conn, option, NBD_REP_META_CONTEXT,
+                 0, "base:allocation") == -1)
+              return -1;
+          }
+
           if (send_newstyle_option_reply (conn, option, NBD_REP_ACK) == -1)
             return -1;
         }
@@ -525,7 +561,30 @@ negotiate_handshake_newstyle_options (struct connection
*conn)
                    option == NBD_OPT_LIST_META_CONTEXT ? "query" :
"set",
                    (int) querylen, &data[opt_index]);
 
-            /* Ignore query - nothing is supported. */
+            /* For LIST, "base:" returns all supported contexts in
the
+             * base namespace.  We only support "base:allocation".
+             */
+            if (option == NBD_OPT_LIST_META_CONTEXT &&
+                querylen == 5 &&
+                strncmp (&data[opt_index], "base:", 5) == 0) {
+              if (send_newstyle_option_reply_meta_context
+                  (conn, option, NBD_REP_META_CONTEXT,
+                   0, "base:allocation") == -1)
+                return -1;
+            }
+            /* "base:allocation" requested by name. */
+            else if (querylen == 15 &&
+                     strncmp (&data[opt_index],
"base:allocation", 15) == 0) {
+              if (send_newstyle_option_reply_meta_context
+                  (conn, option, NBD_REP_META_CONTEXT,
+                   option == NBD_OPT_SET_META_CONTEXT
+                   ? base_allocation_id : 0,
+                   "base:allocation") == -1)
+                return -1;
+              if (option == NBD_OPT_SET_META_CONTEXT)
+                conn->meta_context_base_allocation = true;
+            }
+            /* Every other query must be ignored. */
 
             opt_index += querylen;
             nr_queries--;
diff --git a/server/protocol-handshake.c b/server/protocol-handshake.c
index 79a5999..9653210 100644
--- a/server/protocol-handshake.c
+++ b/server/protocol-handshake.c
@@ -110,6 +110,17 @@ protocol_compute_eflags (struct connection *conn, uint16_t
*flags)
     conn->can_multi_conn = true;
   }
 
+  /* The result of this is not returned to callers here (or at any
+   * time during the handshake).  However it makes sense to do it once
+   * per connection and store the result in the handle anyway.  This
+   * protocol_compute_eflags function is a bit misnamed XXX.
+   */
+  fl = backend->can_extents (backend, conn);
+  if (fl == -1)
+    return -1;
+  if (fl)
+    conn->can_extents = true;
+
   *flags = eflags;
   return 0;
 }
diff --git a/server/protocol.c b/server/protocol.c
index f117d42..baa1f16 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -36,6 +36,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <string.h>
 #include <unistd.h>
@@ -44,6 +45,7 @@
 
 #include "internal.h"
 #include "byte-swapping.h"
+#include "minmax.h"
 #include "protocol.h"
 
 /* Maximum read or write request that we will handle. */
@@ -78,6 +80,7 @@ validate_request (struct connection *conn,
   case NBD_CMD_WRITE:
   case NBD_CMD_TRIM:
   case NBD_CMD_WRITE_ZEROES:
+  case NBD_CMD_BLOCK_STATUS:
     if (!valid_range (conn, offset, count)) {
       /* XXX Allow writes to extend the disk? */
       nbdkit_error ("invalid request: %s: offset and count are out of
range: "
@@ -106,7 +109,8 @@ validate_request (struct connection *conn,
   }
 
   /* Validate flags */
-  if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
+  if (flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
+                NBD_CMD_FLAG_REQ_ONE)) {
     nbdkit_error ("invalid request: unknown flag (0x%x)", flags);
     *error = EINVAL;
     return false;
@@ -117,6 +121,12 @@ validate_request (struct connection *conn,
     *error = EINVAL;
     return false;
   }
+  if ((flags & NBD_CMD_FLAG_REQ_ONE) &&
+      cmd != NBD_CMD_BLOCK_STATUS) {
+    nbdkit_error ("invalid request: REQ_ONE flag needs BLOCK_STATUS
request");
+    *error = EINVAL;
+    return false;
+  }
   if (!conn->can_fua && (flags & NBD_CMD_FLAG_FUA)) {
     nbdkit_error ("invalid request: FUA flag not supported");
     *error = EINVAL;
@@ -157,14 +167,37 @@ validate_request (struct connection *conn,
     return false;
   }
 
+  /* Block status allowed? */
+  if (cmd == NBD_CMD_BLOCK_STATUS) {
+    if (!conn->structured_replies) {
+      nbdkit_error ("invalid request: "
+                    "%s: structured replies was not negotiated",
+                    name_of_nbd_cmd (cmd));
+      *error = EINVAL;
+      return false;
+    }
+    if (!conn->meta_context_base_allocation) {
+      nbdkit_error ("invalid request: "
+                    "%s: base:allocation was not negotiated",
+                    name_of_nbd_cmd (cmd));
+      *error = EINVAL;
+      return false;
+    }
+  }
+
   return true;                     /* Command validates. */
 }
 
 /* This is called with the request lock held to actually execute the
  * request (by calling the plugin).  Note that the request fields have
  * been validated already in 'validate_request' so we don't have to
- * check them again.  'buf' is either the data to be written or the
- * data to be returned, and points to a buffer of size 'count' bytes.
+ * check them again.
+ *
+ * 'buf' is either the data to be written or the data to be returned,
+ * and points to a buffer of size 'count' bytes.
+ *
+ * 'extents' is an empty extents list used for block status requests
+ * only.
  *
  * In all cases, the return value is the system errno value that will
  * later be converted to the nbd error to send back to the client (0
@@ -173,7 +206,7 @@ validate_request (struct connection *conn,
 static uint32_t
 handle_request (struct connection *conn,
                 uint16_t cmd, uint16_t flags, uint64_t offset, uint32_t count,
-                void *buf)
+                void *buf, struct nbdkit_extents *extents)
 {
   uint32_t f = 0;
   bool fua = conn->can_fua && (flags & NBD_CMD_FLAG_FUA);
@@ -217,6 +250,33 @@ handle_request (struct connection *conn,
       return err;
     break;
 
+  case NBD_CMD_BLOCK_STATUS:
+    /* The other backend methods don't check can_*.  That is because
+     * those methods are implicitly suppressed by returning eflags to
+     * the client.  However there is no eflag for extents so we must
+     * check it here.
+     */
+    if (conn->can_extents) {
+      if (flags & NBD_CMD_FLAG_REQ_ONE)
+        f |= NBDKIT_FLAG_REQ_ONE;
+      if (backend->extents (backend, conn, count, offset, f,
+                            extents, &err) == -1)
+        return err;
+    }
+    else {
+      int r;
+
+      /* By default it is safe assume that everything in the range is
+       * allocated.
+       */
+      errno = 0;
+      r = nbdkit_add_extent (extents, offset, count, 0 /* allocated data */);
+      if (r == -1)
+        return errno ? errno : EINVAL;
+      return 0;
+    }
+    break;
+
   default:
     abort ();
   }
@@ -359,6 +419,143 @@ send_structured_reply_read (struct connection *conn,
   return 1;                     /* command processed ok */
 }
 
+/* Convert a list of extents into NBD_REPLY_TYPE_BLOCK_STATUS blocks.
+ * The rules here are very complicated.  Read the spec carefully!
+ */
+static struct block_descriptor *
+extents_to_block_descriptors (struct nbdkit_extents *extents,
+                              uint16_t flags,
+                              uint32_t count, uint64_t offset,
+                              size_t *nr_blocks)
+{
+  const bool req_one = flags & NBD_CMD_FLAG_REQ_ONE;
+  const size_t nr_extents = nbdkit_extents_count (extents);
+  size_t i;
+  struct block_descriptor *blocks;
+
+  /* This is checked in server/plugins.c. */
+  assert (nr_extents >= 1);
+
+  /* We may send fewer than nr_extents blocks, but never more. */
+  blocks = calloc (req_one ? 1 : nr_extents, sizeof (struct block_descriptor));
+  if (blocks == NULL) {
+    nbdkit_error ("malloc");
+    return NULL;
+  }
+
+  if (req_one) {
+    const struct nbdkit_extent e = nbdkit_get_extent (extents, 0);
+
+    /* Checked as a side effect of how the extent list is created. */
+    assert (e.length > 0);
+
+    *nr_blocks = 1;
+
+    /* Must not exceed count of the original request. */
+    blocks[0].length = MIN (e.length, (uint64_t) count);
+    blocks[0].status_flags = e.type & 3;
+  }
+  else {
+    uint64_t pos = offset;
+
+    for (i = 0; i < nr_extents; ++i) {
+      const struct nbdkit_extent e = nbdkit_get_extent (extents, i);
+      uint64_t length;
+
+      if (i == 0)
+        assert (e.offset == offset);
+
+      /* Must not exceed UINT32_MAX. */
+      length = MIN (e.length, UINT32_MAX);
+      blocks[i].status_flags = e.type & 3;
+
+      pos += length;
+      if (pos > offset + count) /* this must be the last block */
+        break;
+
+      /* If we reach here then we must have consumed this whole
+       * extent.  This is currently true because the server only sends
+       * 32 bit requests, but if we move to 64 bit requests we will
+       * need to revisit this code so it can split extents into
+       * multiple blocks.  XXX
+       */
+      assert (e.length <= length);
+    }
+
+    *nr_blocks = i;
+  }
+
+#if 0
+  for (i = 0; i < *nr_blocks; ++i)
+    nbdkit_debug ("block status: sending block %" PRIu32 " type
%" PRIu32,
+                  blocks[i].length, blocks[i].status_flags);
+#endif
+
+  /* Convert to big endian for the protocol. */
+  for (i = 0; i < *nr_blocks; ++i) {
+    blocks[i].length = htobe32 (blocks[i].length);
+    blocks[i].status_flags = htobe32 (blocks[i].status_flags);
+  }
+
+  return blocks;
+}
+
+static int
+send_structured_reply_block_status (struct connection *conn,
+                                    uint64_t handle,
+                                    uint16_t cmd, uint16_t flags,
+                                    uint32_t count, uint64_t offset,
+                                    struct nbdkit_extents *extents)
+{
+  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&conn->write_lock);
+  struct structured_reply reply;
+  CLEANUP_FREE struct block_descriptor *blocks = NULL;
+  size_t nr_blocks;
+  uint32_t context_id;
+  size_t i;
+  int r;
+
+  assert (conn->meta_context_base_allocation);
+  assert (cmd == NBD_CMD_BLOCK_STATUS);
+
+  blocks = extents_to_block_descriptors (extents, flags, count, offset,
+                                         &nr_blocks);
+  if (blocks == NULL)
+    return connection_set_status (conn, -1);
+
+  reply.magic = htobe32 (NBD_STRUCTURED_REPLY_MAGIC);
+  reply.handle = handle;
+  reply.flags = htobe16 (NBD_REPLY_FLAG_DONE);
+  reply.type = htobe16 (NBD_REPLY_TYPE_BLOCK_STATUS);
+  reply.length = htobe32 (sizeof context_id +
+                          nr_blocks * sizeof (struct block_descriptor));
+
+  r = conn->send (conn, &reply, sizeof reply);
+  if (r == -1) {
+    nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd));
+    return connection_set_status (conn, -1);
+  }
+
+  /* Send the base:allocation context ID. */
+  context_id = htobe32 (base_allocation_id);
+  r = conn->send (conn, &context_id, sizeof context_id);
+  if (r == -1) {
+    nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd));
+    return connection_set_status (conn, -1);
+  }
+
+  /* Send each block descriptor. */
+  for (i = 0; i < nr_blocks; ++i) {
+    r = conn->send (conn, &blocks[i], sizeof blocks[i]);
+    if (r == -1) {
+      nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd));
+      return connection_set_status (conn, -1);
+    }
+  }
+
+  return 1;                     /* command processed ok */
+}
+
 static int
 send_structured_reply_error (struct connection *conn,
                              uint64_t handle, uint16_t cmd, uint32_t error)
@@ -402,6 +599,7 @@ protocol_recv_request_send_reply (struct connection *conn)
   uint32_t magic, count, error = 0;
   uint64_t offset;
   CLEANUP_FREE char *buf = NULL;
+  CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents = NULL;
 
   /* Read the request packet. */
   {
@@ -449,6 +647,7 @@ protocol_recv_request_send_reply (struct connection *conn)
     if (cmd == NBD_CMD_READ || cmd == NBD_CMD_WRITE) {
       buf = malloc (count);
       if (buf == NULL) {
+      out_of_memory:
         perror ("malloc");
         error = ENOMEM;
         if (cmd == NBD_CMD_WRITE &&
@@ -458,6 +657,13 @@ protocol_recv_request_send_reply (struct connection *conn)
       }
     }
 
+    /* Allocate the extents list for block status only. */
+    if (cmd == NBD_CMD_BLOCK_STATUS) {
+      extents = nbdkit_extents_new (offset);
+      if (extents == NULL)
+        goto out_of_memory;
+    }
+
     /* Receive the write data buffer. */
     if (cmd == NBD_CMD_WRITE) {
       r = conn->recv (conn, buf, count);
@@ -478,7 +684,7 @@ protocol_recv_request_send_reply (struct connection *conn)
   }
   else {
     lock_request (conn);
-    error = handle_request (conn, cmd, flags, offset, count, buf);
+    error = handle_request (conn, cmd, flags, offset, count, buf, extents);
     assert ((int) error >= 0);
     unlock_request (conn);
   }
@@ -498,15 +704,23 @@ protocol_recv_request_send_reply (struct connection *conn)
   }
 
   /* Currently we prefer to send simple replies for everything except
-   * where we have to (ie. NBD_CMD_READ when structured_replies have
-   * been negotiated).  However this prevents us from sending
-   * human-readable error messages to the client, so we should
-   * reconsider this in future.
+   * where we have to (ie. NBD_CMD_READ and NBD_CMD_BLOCK_STATUS when
+   * structured_replies have been negotiated).  However this prevents
+   * us from sending human-readable error messages to the client, so
+   * we should reconsider this in future.
    */
-  if (conn->structured_replies && cmd == NBD_CMD_READ) {
-    if (!error)
-      return send_structured_reply_read (conn, request.handle, cmd,
-                                         buf, count, offset);
+  if (conn->structured_replies &&
+      (cmd == NBD_CMD_READ || cmd == NBD_CMD_BLOCK_STATUS)) {
+    if (!error) {
+      if (cmd == NBD_CMD_READ)
+        return send_structured_reply_read (conn, request.handle, cmd,
+                                           buf, count, offset);
+      else /* NBD_CMD_BLOCK_STATUS */
+        return send_structured_reply_block_status (conn, request.handle,
+                                                   cmd, flags,
+                                                   count, offset,
+                                                   extents);
+    }
     else
       return send_structured_reply_error (conn, request.handle, cmd, error);
   }
diff --git a/TODO b/TODO
index 7cbc238..c968c2d 100644
--- a/TODO
+++ b/TODO
@@ -24,8 +24,8 @@ General ideas for improvements
   to inform nbdkit when the response is ready:
   https://www.redhat.com/archives/libguestfs/2018-January/msg00149.html
 
-* More NBD protocol features. In the upstream pipeline: proposals for
-  block status and online resize.
+* More NBD protocol features.  The only currently missing feature is
+  online resize.
 
 * Add a callback to let plugins request minimum alignment for the
   buffer to pread/pwrite; useful for a plugin utilizing O_DIRECT or
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 04/15] blocksize: Implement extents.
---
 filters/blocksize/nbdkit-blocksize-filter.pod |  4 ++--
 filters/blocksize/blocksize.c                 | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/filters/blocksize/nbdkit-blocksize-filter.pod
b/filters/blocksize/nbdkit-blocksize-filter.pod
index b6cf4dd..0abed2f 100644
--- a/filters/blocksize/nbdkit-blocksize-filter.pod
+++ b/filters/blocksize/nbdkit-blocksize-filter.pod
@@ -56,8 +56,8 @@ of 1024.
 
 =item B<maxlen=>SIZE
 
-The maximum length for any single transaction without data (trim and
-zero).  If omitted, this defaults to 0xffffffff rounded down to
+The maximum length for any single transaction without data (trim, zero
+or extents).  If omitted, this defaults to 0xffffffff rounded down to
 C<minsize> alignment (that is, the inherent 32-bit limit of the NBD
 protocol).  This need not be a power of two, but must be an integer
 multiple of C<minblock>, and should be at least as large as
diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c
index 9e0e713..55b2d59 100644
--- a/filters/blocksize/blocksize.c
+++ b/filters/blocksize/blocksize.c
@@ -45,6 +45,7 @@
 #include <nbdkit-filter.h>
 
 #include "minmax.h"
+#include "rounding.h"
 
 /* XXX See design comment in filters/cow/cow.c. */
 #define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS
@@ -372,6 +373,23 @@ blocksize_zero (struct nbdkit_next_ops *next_ops, void
*nxdata,
   return 0;
 }
 
+static int
+blocksize_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+                   void *handle, uint32_t count, uint64_t offset,
+                   uint32_t flags, struct nbdkit_extents *extents, int *err)
+{
+  /* Ask the plugin for blocksize-aligned data.  Since the extents
+   * list start is set to the real offset, everything before the
+   * offset is ignored automatically.  Also we only need to ask for
+   * maxlen of data, because it's fine to return less than the full
+   * count as long as we're making progress.
+   */
+  return next_ops->extents (nxdata,
+                            MIN (count, maxlen),
+                            ROUND_DOWN (offset, minblock),
+                            flags, extents, err);
+}
+
 static struct nbdkit_filter filter = {
   .name              = "blocksize",
   .longname          = "nbdkit blocksize filter",
@@ -386,6 +404,7 @@ static struct nbdkit_filter filter = {
   .pwrite            = blocksize_pwrite,
   .trim              = blocksize_trim,
   .zero              = blocksize_zero,
+  .extents           = blocksize_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 05/15] cow: Disable extents information in this filter.
The cow filter doesn't support trimming and we should assume the
underlying plugin is fully allocated too.  Note that both of these
limitations might be lifted with a more advanced filter
implementation.
However we ought to support this in future because xz files contain
sparseness information, so add a note.
---
 filters/cow/cow.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/filters/cow/cow.c b/filters/cow/cow.c
index b8f08c4..570a65f 100644
--- a/filters/cow/cow.c
+++ b/filters/cow/cow.c
@@ -114,7 +114,7 @@ cow_prepare (struct nbdkit_next_ops *next_ops, void *nxdata,
 }
 
 /* Whatever the underlying plugin can or can't do, we can write, we
- * cannot trim, and we can flush.
+ * cannot trim or detect extents, and we can flush.
  */
 static int
 cow_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle)
@@ -128,6 +128,12 @@ cow_can_trim (struct nbdkit_next_ops *next_ops, void
*nxdata, void *handle)
   return 0;
 }
 
+static int
+cow_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle)
+{
+  return 0;
+}
+
 static int
 cow_can_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle)
 {
@@ -316,6 +322,7 @@ static struct nbdkit_filter filter = {
   .can_write         = cow_can_write,
   .can_flush         = cow_can_flush,
   .can_trim          = cow_can_trim,
+  .can_extents       = cow_can_extents,
   .can_fua           = cow_can_fua,
   .pread             = cow_pread,
   .pwrite            = cow_pwrite,
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 06/15] delay: Allow block status (extents) requests to be separately delayed.
---
 filters/delay/nbdkit-delay-filter.pod |  8 ++++++++
 filters/delay/delay.c                 | 26 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)
diff --git a/filters/delay/nbdkit-delay-filter.pod
b/filters/delay/nbdkit-delay-filter.pod
index c2eb172..2e2ac74 100644
--- a/filters/delay/nbdkit-delay-filter.pod
+++ b/filters/delay/nbdkit-delay-filter.pod
@@ -11,6 +11,7 @@ nbdkit-delay-filter - nbdkit delay filter
  nbdkit --filter=delay plugin [plugin-args ...]
           delay-read=(SECS|NNms) delay-write=(SECS|NNms)
           delay-zero=(SECS|NNms) delay-trim=(SECS|NNms)
+          delay-extents=(SECS|NNms)
 
 =head1 DESCRIPTION
 
@@ -63,6 +64,13 @@ Delay zero operations by C<SECS> seconds or C<NN>
milliseconds.
 
 Delay trim/discard operations by C<SECS> seconds or C<NN>
milliseconds.
 
+=item B<delay-extents=>SECS
+
+=item B<delay-extents=>NNB<ms>
+
+Delay block status (extents) operations by C<SECS> seconds or C<NN>
+milliseconds.
+
 =item B<wdelay=>SECS
 
 =item B<wdelay=>NNB<ms>
diff --git a/filters/delay/delay.c b/filters/delay/delay.c
index a00292b..d11958b 100644
--- a/filters/delay/delay.c
+++ b/filters/delay/delay.c
@@ -47,6 +47,7 @@ static int delay_read_ms = 0;   /* read delay (milliseconds)
*/
 static int delay_write_ms = 0;  /* write delay (milliseconds) */
 static int delay_zero_ms = 0;   /* zero delay (milliseconds) */
 static int delay_trim_ms = 0;   /* trim delay (milliseconds) */
+static int delay_extents_ms = 0;/* extents delay (milliseconds) */
 
 static int
 parse_delay (const char *key, const char *value)
@@ -110,6 +111,12 @@ trim_delay (void)
   delay (delay_trim_ms);
 }
 
+static void
+extents_delay (void)
+{
+  delay (delay_extents_ms);
+}
+
 /* Called for each key=value passed on the command line. */
 static int
 delay_config (nbdkit_next_config *next, void *nxdata,
@@ -154,6 +161,13 @@ delay_config (nbdkit_next_config *next, void *nxdata,
       return -1;
     return 0;
   }
+  else if (strcmp (key, "delay-extent") == 0 ||
+           strcmp (key, "delay-extents") == 0) {
+    delay_extents_ms = parse_delay (key, value);
+    if (delay_extents_ms == -1)
+      return -1;
+    return 0;
+  }
   else
     return next (nxdata, key, value);
 }
@@ -164,6 +178,7 @@ delay_config (nbdkit_next_config *next, void *nxdata,
   "delay-write=<NN>[ms]           Write delay in
seconds/milliseconds.\n" \
   "delay-zero=<NN>[ms]            Zero delay in
seconds/milliseconds.\n" \
   "delay-trim=<NN>[ms]            Trim delay in
seconds/milliseconds.\n" \
+  "delay-extents=<NN>[ms]         Extents delay in
seconds/milliseconds.\n" \
   "wdelay=<NN>[ms]                Write, zero and trim delay in
secs/msecs."
 
 /* Read data. */
@@ -207,6 +222,16 @@ delay_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
   return next_ops->trim (nxdata, count, offset, flags, err);
 }
 
+/* Extents. */
+static int
+delay_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+               void *handle, uint32_t count, uint64_t offset, uint32_t flags,
+               struct nbdkit_extents *extents, int *err)
+{
+  extents_delay ();
+  return next_ops->extents (nxdata, count, offset, flags, extents, err);
+}
+
 static struct nbdkit_filter filter = {
   .name              = "delay",
   .longname          = "nbdkit delay filter",
@@ -217,6 +242,7 @@ static struct nbdkit_filter filter = {
   .pwrite            = delay_pwrite,
   .zero              = delay_zero,
   .trim              = delay_trim,
+  .extents           = delay_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 07/15] error: Extend the error filter so it can inject errors into block status extents requests.
---
 filters/error/nbdkit-error-filter.pod | 11 ++++++--
 filters/error/error.c                 | 36 ++++++++++++++++++++++++---
 2 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/filters/error/nbdkit-error-filter.pod
b/filters/error/nbdkit-error-filter.pod
index 504e1b0..16ef184 100644
--- a/filters/error/nbdkit-error-filter.pod
+++ b/filters/error/nbdkit-error-filter.pod
@@ -12,6 +12,7 @@ nbdkit-error-filter - inject errors for testing clients
      [error-pwrite=...] [error-pwrite-rate=...] [error-pwrite-file=...]
      [error-trim=...] [error-trim-rate=...] [error-trim-file=...]
      [error-zero=...] [error-zero-rate=...] [error-zero-file=...]
+     [error-extents=...] [error-extents-rate=...] [error-extents-file=...]
 
 =head1 DESCRIPTION
 
@@ -29,8 +30,9 @@ Inject a low rate of errors randomly into the connection:
 
  nbdkit --filter=error file disk.img error-rate=1%
 
-Reading and trimming requests will be successful, but all
-writes and zeroing will return "No space left on device":
+Reading, trimming and extents (block status) requests will be
+successful, but all writes and zeroing will return "No space left on
+device":
 
  nbdkit --filter=error file disk.img \
                             error=ENOSPC \
@@ -105,6 +107,11 @@ settings to NBD trim requests.
 Same as C<error>, C<error-rate> and C<error-file> but only
apply the
 settings to NBD zero requests.
 
+=item B<error-extents>, B<error-extents-rate>,
B<error-extents-file>.
+
+Same as C<error>, C<error-rate> and C<error-file> but only
apply the
+settings to NBD block status requests to read extents.
+
 =back
 
 =head1 NOTES
diff --git a/filters/error/error.c b/filters/error/error.c
index 598bd1f..405bbe3 100644
--- a/filters/error/error.c
+++ b/filters/error/error.c
@@ -63,6 +63,7 @@ static struct error_settings pread_settings = ERROR_DEFAULT;
 static struct error_settings pwrite_settings = ERROR_DEFAULT;
 static struct error_settings trim_settings = ERROR_DEFAULT;
 static struct error_settings zero_settings = ERROR_DEFAULT;
+static struct error_settings extents_settings = ERROR_DEFAULT;
 
 /* Random state.
  * This must only be accessed when holding the lock (except for load).
@@ -83,6 +84,7 @@ error_unload (void)
   free (pwrite_settings.file);
   free (trim_settings.file);
   free (zero_settings.file);
+  free (extents_settings.file);
 }
 
 static const struct { const char *name; int error; } errors[] = {
@@ -162,7 +164,8 @@ error_config (nbdkit_next_config *next, void *nxdata,
     if (parse_error (key, value, &i) == -1)
       return -1;
     pread_settings.error = pwrite_settings.error -      trim_settings.error =
zero_settings.error = i;
+      trim_settings.error = zero_settings.error +      extents_settings.error =
i;
     return 0;
   }
   else if (strcmp (key, "error-pread") == 0)
@@ -173,12 +176,15 @@ error_config (nbdkit_next_config *next, void *nxdata,
     return parse_error (key, value, &trim_settings.error);
   else if (strcmp (key, "error-zero") == 0)
     return parse_error (key, value, &zero_settings.error);
+  else if (strcmp (key, "error-extents") == 0)
+    return parse_error (key, value, &extents_settings.error);
 
   else if (strcmp (key, "error-rate") == 0) {
     if (parse_error_rate (key, value, &d) == -1)
       return -1;
     pread_settings.rate = pwrite_settings.rate -      trim_settings.rate =
zero_settings.rate = d;
+      trim_settings.rate = zero_settings.rate +      extents_settings.rate = d;
     return 0;
   }
   else if (strcmp (key, "error-pread-rate") == 0)
@@ -189,6 +195,8 @@ error_config (nbdkit_next_config *next, void *nxdata,
     return parse_error_rate (key, value, &trim_settings.rate);
   else if (strcmp (key, "error-zero-rate") == 0)
     return parse_error_rate (key, value, &zero_settings.rate);
+  else if (strcmp (key, "error-extents-rate") == 0)
+    return parse_error_rate (key, value, &extents_settings.rate);
 
   /* NB: We are using nbdkit_absolute_path here because the trigger
    * file probably doesn't exist yet.
@@ -202,6 +210,8 @@ error_config (nbdkit_next_config *next, void *nxdata,
     trim_settings.file = nbdkit_absolute_path (value);
     free (zero_settings.file);
     zero_settings.file = nbdkit_absolute_path (value);
+    free (extents_settings.file);
+    extents_settings.file = nbdkit_absolute_path (value);
     return 0;
   }
   else if (strcmp (key, "error-pread-rate") == 0) {
@@ -224,6 +234,11 @@ error_config (nbdkit_next_config *next, void *nxdata,
     zero_settings.file = nbdkit_absolute_path (value);
     return 0;
   }
+  else if (strcmp (key, "error-extents-rate") == 0) {
+    free (extents_settings.file);
+    extents_settings.file = nbdkit_absolute_path (value);
+    return 0;
+  }
 
   else
     return next (nxdata, key, value);
@@ -234,8 +249,8 @@ error_config (nbdkit_next_config *next, void *nxdata,
   "                               The error indication to return.\n"
\
   "error-rate=0%..100%|0..1       Rate of errors to generate.\n" \
   "error-file=TRIGGER             Set trigger filename.\n" \
-  "error-pread*, error-pwrite*, error-trim*, error-zero*\n" \
-  "                               Apply settings only to
read/write/trim/zero"
+  "error-pread*, error-pwrite*, error-trim*, error-zero*,
error-extents*\n" \
+  "                               Apply settings only to
read/write/etc"
 
 static void *
 error_open (nbdkit_next_open *next, void *nxdata, int readonly)
@@ -330,6 +345,18 @@ error_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
   return next_ops->zero (nxdata, count, offset, flags, err);
 }
 
+/* Extents. */
+static int
+error_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+               void *handle, uint32_t count, uint64_t offset,
+               uint32_t flags, struct nbdkit_extents *extents, int *err)
+{
+  if (random_error (&extents_settings, "extents", err))
+    return -1;
+
+  return next_ops->extents (nxdata, count, offset, flags, extents, err);
+}
+
 static struct nbdkit_filter filter = {
   .name              = "error",
   .longname          = "nbdkit error filter",
@@ -343,6 +370,7 @@ static struct nbdkit_filter filter = {
   .pwrite            = error_pwrite,
   .trim              = error_trim,
   .zero              = error_zero,
+  .extents           = error_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 08/15] log: Log extents requests.
Typical output:
2019-03-26 18:22:17.287434 connection=2 Extents id=2 offset=0x0 count=0x40000000
req_one=1 ...
2019-03-26 18:22:17.290040 connection=2 ...Extents id=2 extents=[{ offset=0x0,
length=0x250000, type=0 }, { offset=0x250000, length=0x7db0000, type=3 }, {
offset=0x8000000, length=0x8000, type=0 }, { offset=0x8008000, length=0x38000,
type=3 }, { offset=0x8040000, length=0x208000, type=0 }, { offset=0x8248000,
length=0x7db8000, type=3 }, { offset=0x10000000, length=0x208000, type=0 }, {
offset=0x10208000, length=0x7df8000, type=3 }, { offset=0x18000000,
length=0x8000, type=0 }, { offset=0x18008000, length=0x38000, type=3 }, {
offset=0x18040000, length=0x208000, type=0 }, { offset=0x18248000,
length=0x7db8000, type=3 }, { offset=0x20000000, length=0x208000, type=0 }, {
offset=0x20208000, length=0x7df8000, type=3 }, { offset=0x28000000,
length=0x8000, type=0 }, { offset=0x28008000, length=0x38000, type=3 }, {
offset=0x28040000, length=0x208000, type=0 }, { offset=0x28248000,
length=0x7db8000, type=3 }, { offset=0x30000000, length=0x208000, type=0 }, {
offset=0x30208000, length=0x7df8000, type=3 }, { offset=0x38000000,
length=0x8000, type=0 }, { offset=0x38008000, length=0x38000, type=3 }, {
offset=0x38040000, length=0x208000, type=0 }, { offset=0x38248000,
length=0x7da8000, type=3 }, { offset=0x3fff0000, length=0x10000, type=0 }]
return=0
As you can see this is logging the complete information as generated
by the underlying plugin, not what is returned to the client (which is
rather hard for the log filter to discern).  It's probably more useful
for debugging like this.
---
 filters/log/log.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
diff --git a/filters/log/log.c b/filters/log/log.c
index 9f2e37d..b31d784 100644
--- a/filters/log/log.c
+++ b/filters/log/log.c
@@ -353,6 +353,50 @@ log_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
   return r;
 }
 
+/* Extents. */
+static int
+log_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+             void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+             struct nbdkit_extents *extents, int *err)
+{
+  struct handle *h = handle;
+  uint64_t id = get_id (h);
+  int r;
+
+  assert (!(flags & ~(NBDKIT_FLAG_REQ_ONE)));
+  output (h, "Extents", id,
+          "offset=0x%" PRIx64 " count=0x%x req_one=%d ...",
+          offs, count, !!(flags & NBDKIT_FLAG_REQ_ONE));
+  r = next_ops->extents (nxdata, count, offs, flags, extents, err);
+  if (r == -1)
+    output_return (h, "...Extents", id, r, err);
+  else {
+    FILE *fp;
+    char *extents_str = NULL;
+    size_t i, n, len = 0;
+
+    fp = open_memstream (&extents_str, &len);
+    if (fp != NULL) {
+      n = nbdkit_extents_count (extents);
+      for (i = 0; i < n; ++i) {
+        struct nbdkit_extent e = nbdkit_get_extent (extents, i);
+        if (i > 0)
+          fprintf (fp, ", ");
+        fprintf (fp, "{ offset=0x%" PRIx64 ", length=0x%"
PRIx64 ", "
+                 "type=%" PRIu32 " }",
+                 e.offset, e.length, e.type);
+      }
+
+      fclose (fp);
+    }
+
+    output (h, "...Extents", id, "extents=[%s] return=0",
+            extents_str ? extents_str : "(null)");
+    free (extents_str);
+  }
+  return r;
+}
+
 static struct nbdkit_filter filter = {
   .name              = "log",
   .longname          = "nbdkit log filter",
@@ -370,6 +414,7 @@ static struct nbdkit_filter filter = {
   .flush             = log_flush,
   .trim              = log_trim,
   .zero              = log_zero,
+  .extents           = log_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 09/15] offset: Implement mapping of extents.
Allows you to safely use nbdkit-offset-filter on top of a plugin
supporting extents.
---
 filters/offset/offset.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
diff --git a/filters/offset/offset.c b/filters/offset/offset.c
index 058571d..fa22615 100644
--- a/filters/offset/offset.c
+++ b/filters/offset/offset.c
@@ -37,6 +37,7 @@
 #include <stdlib.h>
 #include <stdint.h>
 #include <string.h>
+#include <assert.h>
 
 #include <nbdkit-filter.h>
 
@@ -132,6 +133,39 @@ offset_zero (struct nbdkit_next_ops *next_ops, void
*nxdata,
   return next_ops->zero (nxdata, count, offs + offset, flags, err);
 }
 
+/* Extents. */
+static int
+offset_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+                void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+                struct nbdkit_extents *extents, int *err)
+{
+  size_t i;
+  struct nbdkit_extents *extents2;
+  struct nbdkit_extent e;
+
+  extents2 = nbdkit_extents_new (offs + offset);
+  if (extents2 == NULL) {
+    *err = errno;
+    return -1;
+  }
+  if (next_ops->extents (nxdata, count, offs + offset,
+                         flags, extents2, err) == -1)
+    goto error;
+
+  for (i = 0; i < nbdkit_extents_count (extents2); ++i) {
+    e = nbdkit_get_extent (extents2, i);
+    e.offset -= offset;
+    if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1)
+      goto error;
+  }
+  nbdkit_extents_free (extents2);
+  return 0;
+
+ error:
+  nbdkit_extents_free (extents2);
+  return -1;
+}
+
 static struct nbdkit_filter filter = {
   .name              = "offset",
   .longname          = "nbdkit offset filter",
@@ -144,6 +178,7 @@ static struct nbdkit_filter filter = {
   .pwrite            = offset_pwrite,
   .trim              = offset_trim,
   .zero              = offset_zero,
+  .extents           = offset_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 10/15] partition: Implement mapping of extents.
---
 filters/partition/partition.c | 36 +++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
diff --git a/filters/partition/partition.c b/filters/partition/partition.c
index c1fe7a1..853d734 100644
--- a/filters/partition/partition.c
+++ b/filters/partition/partition.c
@@ -38,6 +38,7 @@
 #include <stdint.h>
 #include <string.h>
 #include <inttypes.h>
+#include <assert.h>
 
 #include <nbdkit-filter.h>
 
@@ -222,6 +223,40 @@ partition_zero (struct nbdkit_next_ops *next_ops, void
*nxdata,
   return next_ops->zero (nxdata, count, offs + h->offset, flags, err);
 }
 
+/* Extents. */
+static int
+partition_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+                   void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+                   struct nbdkit_extents *extents, int *err)
+{
+  struct handle *h = handle;
+  size_t i;
+  struct nbdkit_extents *extents2;
+  struct nbdkit_extent e;
+
+  extents2 = nbdkit_extents_new (offs + h->offset);
+  if (extents2 == NULL) {
+    *err = errno;
+    return -1;
+  }
+  if (next_ops->extents (nxdata, count, offs + h->offset,
+                         flags, extents2, err) == -1)
+    goto error;
+
+  for (i = 0; i < nbdkit_extents_count (extents2); ++i) {
+    e = nbdkit_get_extent (extents2, i);
+    e.offset -= h->offset;
+    if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1)
+      goto error;
+  }
+  nbdkit_extents_free (extents2);
+  return 0;
+
+ error:
+  nbdkit_extents_free (extents2);
+  return -1;
+}
+
 static struct nbdkit_filter filter = {
   .name              = "partition",
   .longname          = "nbdkit partition filter",
@@ -237,6 +272,7 @@ static struct nbdkit_filter filter = {
   .pwrite            = partition_pwrite,
   .trim              = partition_trim,
   .zero              = partition_zero,
+  .extents           = partition_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 11/15] truncate: Implement extents for beyond end of truncated region.
---
 filters/truncate/truncate.c | 56 +++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)
diff --git a/filters/truncate/truncate.c b/filters/truncate/truncate.c
index b95432a..11d32ff 100644
--- a/filters/truncate/truncate.c
+++ b/filters/truncate/truncate.c
@@ -285,6 +285,61 @@ truncate_zero (struct nbdkit_next_ops *next_ops, void
*nxdata,
   return 0;
 }
 
+/* Extents. */
+static int
+truncate_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+                  void *handle, uint32_t count, uint64_t offset,
+                  uint32_t flags, struct nbdkit_extents *extents, int *err)
+{
+  int r;
+  uint32_t n;
+  uint64_t real_size_copy;
+
+  pthread_mutex_lock (&lock);
+  real_size_copy = real_size;
+  pthread_mutex_unlock (&lock);
+
+  /* If the entire request is beyond the end of the underlying plugin
+   * then this is the easy case: return a hole.
+   */
+  if (offset >= real_size_copy) {
+    return nbdkit_add_extent (extents, offset, (uint64_t) count,
+                              NBDKIT_EXTENT_ZERO|NBDKIT_EXTENT_HOLE);
+  }
+
+  if (offset < real_size_copy) {
+    if (offset + count <= real_size_copy)
+      n = count;
+    else
+      n = real_size_copy - offset;
+    r = next_ops->extents (nxdata, n, offset, flags, extents, err);
+    if (r == -1)
+      return -1;
+    count -= n;
+  }
+
+  if (count > 0) {
+    size_t nr_extents = nbdkit_extents_count (extents);
+
+    /* If we are asked for extent information beyond the end of the
+     * real size of the underlying device, then we return a hole.
+     * However as we don't know if the underlying device returned the
+     * full extents data (it's not required to), check that we won't
+     * break the extents invariant first.
+     */
+    if (nr_extents > 0) {
+      struct nbdkit_extent e;
+
+      e = nbdkit_get_extent (extents, nr_extents-1);
+      if (e.offset + e.length == real_size_copy)
+        return nbdkit_add_extent (extents, real_size_copy, (uint64_t) count,
+                                  NBDKIT_EXTENT_ZERO|NBDKIT_EXTENT_HOLE);
+    }
+  }
+
+  return 0;
+}
+
 static struct nbdkit_filter filter = {
   .name              = "truncate",
   .longname          = "nbdkit truncate filter",
@@ -297,6 +352,7 @@ static struct nbdkit_filter filter = {
   .pwrite            = truncate_pwrite,
   .trim              = truncate_trim,
   .zero              = truncate_zero,
+  .extents           = truncate_extents,
 };
 
 NBDKIT_REGISTER_FILTER(filter)
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 12/15] xz: Disable extents information in this filter.
However we ought to support this in future because xz files contain
sparseness information, so add a note.
---
 filters/xz/xz.c | 11 +++++++++++
 1 file changed, 11 insertions(+)
diff --git a/filters/xz/xz.c b/filters/xz/xz.c
index ec41f25..3013538 100644
--- a/filters/xz/xz.c
+++ b/filters/xz/xz.c
@@ -185,6 +185,16 @@ xz_can_write (struct nbdkit_next_ops *next_ops, void
*nxdata,
   return 0;
 }
 
+/* Similar to above.  However xz files themselves do support
+ * sparseness so in future we should generate extents information. XXX
+ */
+static int
+xz_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+                void *handle)
+{
+  return 0;
+}
+
 /* Read data from the file. */
 static int
 xz_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
@@ -238,6 +248,7 @@ static struct nbdkit_filter filter = {
   .prepare           = xz_prepare,
   .get_size          = xz_get_size,
   .can_write         = xz_can_write,
+  .can_extents       = xz_can_extents,
   .pread             = xz_pread,
 };
 
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 13/15] data, memory: Implement extents.
These plugins are both based on the same sparse array structure which
supports a simple implementation of extents.
---
 common/sparse/sparse.h  |  7 ++++++-
 common/sparse/sparse.c  | 29 ++++++++++++++++++++++++++++-
 plugins/data/data.c     | 16 +++++++++++++++-
 plugins/memory/memory.c | 16 +++++++++++++++-
 4 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/common/sparse/sparse.h b/common/sparse/sparse.h
index 818d804..eb24a0b 100644
--- a/common/sparse/sparse.h
+++ b/common/sparse/sparse.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -87,4 +87,9 @@ extern void sparse_array_zero (struct sparse_array *sa,
                                uint32_t count, uint64_t offset)
   __attribute__((__nonnull__ (1)));
 
+/* Return information about allocated pages and holes. */
+extern int sparse_array_extents (struct sparse_array *sa,
+                                 uint32_t count, uint64_t offset,
+                                 struct nbdkit_extents *extents);
+
 #endif /* NBDKIT_SPARSE_H */
diff --git a/common/sparse/sparse.c b/common/sparse/sparse.c
index a5ace48..f388cd4 100644
--- a/common/sparse/sparse.c
+++ b/common/sparse/sparse.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -360,3 +360,30 @@ sparse_array_zero (struct sparse_array *sa, uint32_t count,
uint64_t offset)
     offset += n;
   }
 }
+
+int
+sparse_array_extents (struct sparse_array *sa,
+                      uint32_t count, uint64_t offset,
+                      struct nbdkit_extents *extents)
+{
+  uint32_t n, type;
+  void *p;
+
+  while (count > 0) {
+    p = lookup (sa, offset, false, &n, NULL);
+
+    if (p == NULL)
+      type = NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO;
+    else
+      type = 0; /* allocated data */
+    if (nbdkit_add_extent (extents, offset, n, type) == -1)
+      return -1;
+
+    if (n > count)
+      n = count;
+    count -= n;
+    offset += n;
+  }
+
+  return 0;
+}
diff --git a/plugins/data/data.c b/plugins/data/data.c
index f9d3881..c5540f6 100644
--- a/plugins/data/data.c
+++ b/plugins/data/data.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2018 Red Hat Inc.
+ * Copyright (C) 2018-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -378,6 +378,19 @@ data_trim (void *handle, uint32_t count, uint64_t offset)
   return 0;
 }
 
+/* Extents. */
+static int
+data_extents (void *handle, uint32_t count, uint64_t offset,
+              uint32_t flags, struct nbdkit_extents *extents)
+{
+  int r;
+
+  pthread_mutex_lock (&lock);
+  r = sparse_array_extents (sa, count, offset, extents);
+  pthread_mutex_unlock (&lock);
+  return r;
+}
+
 static struct nbdkit_plugin plugin = {
   .name              = "data",
   .version           = PACKAGE_VERSION,
@@ -394,6 +407,7 @@ static struct nbdkit_plugin plugin = {
   .pwrite            = data_pwrite,
   .zero              = data_zero,
   .trim              = data_trim,
+  .extents           = data_extents,
   /* In this plugin, errno is preserved properly along error return
    * paths from failed system calls.
    */
diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c
index e27e127..741eaad 100644
--- a/plugins/memory/memory.c
+++ b/plugins/memory/memory.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -170,6 +170,19 @@ memory_trim (void *handle, uint32_t count, uint64_t offset)
   return 0;
 }
 
+/* Extents. */
+static int
+memory_extents (void *handle, uint32_t count, uint64_t offset,
+                uint32_t flags, struct nbdkit_extents *extents)
+{
+  int r;
+
+  pthread_mutex_lock (&lock);
+  r = sparse_array_extents (sa, count, offset, extents);
+  pthread_mutex_unlock (&lock);
+  return r;
+}
+
 static struct nbdkit_plugin plugin = {
   .name              = "memory",
   .version           = PACKAGE_VERSION,
@@ -185,6 +198,7 @@ static struct nbdkit_plugin plugin = {
   .pwrite            = memory_pwrite,
   .zero              = memory_zero,
   .trim              = memory_trim,
+  .extents           = memory_extents,
   /* In this plugin, errno is preserved properly along error return
    * paths from failed system calls.
    */
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 14/15] file: Implement extents.
This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and
holes in the underlying file.
---
 plugins/file/file.c | 141 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 131 insertions(+), 10 deletions(-)
diff --git a/plugins/file/file.c b/plugins/file/file.c
index 628f8fb..2fbc2a3 100644
--- a/plugins/file/file.c
+++ b/plugins/file/file.c
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -37,6 +37,7 @@
 #include <stdlib.h>
 #include <stdbool.h>
 #include <string.h>
+#include <inttypes.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <sys/types.h>
@@ -44,6 +45,8 @@
 #include <sys/ioctl.h>
 #include <errno.h>
 
+#include <pthread.h>
+
 #if defined(__linux__) && !defined(FALLOC_FL_PUNCH_HOLE)
 #include <linux/falloc.h>   /* For FALLOC_FL_*, glibc < 2.18 */
 #endif
@@ -68,7 +71,11 @@
 
 static char *filename = NULL;
 
-int file_debug_zero;            /* to enable: -D file.zero=1 */
+/* Any callbacks using lseek must be protected by this lock. */
+static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
+
+/* to enable: -D file.zero=1 */
+int file_debug_zero;
 
 static void
 file_unload (void)
@@ -220,6 +227,23 @@ file_close (void *handle)
 
 #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
 
+/* For block devices, stat->st_size is not the true size.  The caller
+ * grabs the lseek_lock.
+ */
+static int64_t
+block_device_size (int fd)
+{
+  off_t size;
+
+  size = lseek (fd, 0, SEEK_END);
+  if (size == -1) {
+    nbdkit_error ("lseek (to find device size): %m");
+    return -1;
+  }
+
+  return size;
+}
+
 /* Get the file size. */
 static int64_t
 file_get_size (void *handle)
@@ -227,15 +251,11 @@ file_get_size (void *handle)
   struct handle *h = handle;
 
   if (h->is_block_device) {
-    /* Block device, so st_size will not be the true size. */
-    off_t size;
-
-    size = lseek (h->fd, 0, SEEK_END);
-    if (size == -1) {
-      nbdkit_error ("lseek (to find device size): %m");
-      return -1;
-    }
+    int64_t size;
 
+    pthread_mutex_lock (&lseek_lock);
+    size = block_device_size (h->fd);
+    pthread_mutex_unlock (&lseek_lock);
     return size;
   } else {
     /* Regular file. */
@@ -501,6 +521,103 @@ file_trim (void *handle, uint32_t count, uint64_t offset,
uint32_t flags)
   return 0;
 }
 
+#ifdef SEEK_HOLE
+/* Extents. */
+
+static int
+file_can_extents (void *handle)
+{
+  struct handle *h = handle;
+  off_t r;
+
+  /* A simple test to see whether SEEK_HOLE etc is likely to work on
+   * the current filesystem.
+   */
+  pthread_mutex_lock (&lseek_lock);
+  r = lseek (h->fd, 0, SEEK_HOLE);
+  pthread_mutex_unlock (&lseek_lock);
+  if (r == -1) {
+    nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m");
+    return 0;
+  }
+  return 1;
+}
+
+static int
+do_extents (void *handle, uint32_t count, uint64_t offset,
+            uint32_t flags, struct nbdkit_extents *extents)
+{
+  struct handle *h = handle;
+  const bool req_one = flags & NBDKIT_FLAG_REQ_ONE;
+  uint64_t end = offset + count;
+
+  do {
+    off_t pos;
+
+    pos = lseek (h->fd, offset, SEEK_DATA);
+    if (pos == -1) {
+      if (errno == ENXIO) {
+        /* The current man page does not describe this situation well,
+         * but a proposed change to POSIX adds these words for ENXIO:
+         * "or the whence argument is SEEK_DATA and the offset falls
+         * within the final hole of the file."
+         */
+        pos = end;
+      }
+      else {
+        nbdkit_error ("lseek: SEEK_DATA: %" PRIu64 ": %m",
offset);
+        return -1;
+      }
+    }
+
+    /* We know there is a hole from offset to pos-1. */
+    if (pos > offset) {
+      if (nbdkit_add_extent (extents, offset, pos - offset,
+                             NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO) == -1)
+        return -1;
+      if (req_one)
+        break;
+    }
+
+    offset = pos;
+    if (offset >= end)
+      break;
+
+    pos = lseek (h->fd, offset, SEEK_HOLE);
+    if (pos == -1) {
+      nbdkit_error ("lseek: SEEK_HOLE: %" PRIu64 ": %m",
offset);
+      return -1;
+    }
+
+    /* We know there is data from offset to pos-1. */
+    if (pos > offset) {
+      if (nbdkit_add_extent (extents, offset, pos - offset,
+                             0 /* allocated data */) == -1)
+        return -1;
+      if (req_one)
+        break;
+    }
+
+    offset = pos;
+  } while (offset < end);
+
+  return 0;
+}
+
+static int
+file_extents (void *handle, uint32_t count, uint64_t offset,
+              uint32_t flags, struct nbdkit_extents *extents)
+{
+  int r;
+
+  pthread_mutex_lock (&lseek_lock);
+  r = do_extents (handle, count, offset, flags, extents);
+  pthread_mutex_unlock (&lseek_lock);
+
+  return r;
+}
+#endif /* SEEK_HOLE */
+
 static struct nbdkit_plugin plugin = {
   .name              = "file",
   .longname          = "nbdkit file plugin",
@@ -522,6 +639,10 @@ static struct nbdkit_plugin plugin = {
   .flush             = file_flush,
   .trim              = file_trim,
   .zero              = file_zero,
+#ifdef SEEK_HOLE
+  .can_extents       = file_can_extents,
+  .extents           = file_extents,
+#endif
   .errno_is_preserved = 1,
 };
 
-- 
2.20.1
Richard W.M. Jones
2019-Mar-26  21:17 UTC
[Libguestfs] [PATCH nbdkit v4 15/15] vddk: Implement extents.
This uses a new API VixDiskLib_QueryAllocatedBlocks provided in
VDDK >= 6.7.
Thanks: Martin Kletzander.
---
 plugins/vddk/vddk-structs.h |  15 +++-
 plugins/vddk/vddk.c         | 141 ++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+), 1 deletion(-)
diff --git a/plugins/vddk/vddk-structs.h b/plugins/vddk/vddk-structs.h
index dbed94a..df88322 100644
--- a/plugins/vddk/vddk-structs.h
+++ b/plugins/vddk/vddk-structs.h
@@ -1,5 +1,5 @@
 /* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -47,6 +47,9 @@ typedef uint64_t VixError;
 #define VIXDISKLIB_FLAG_OPEN_READ_ONLY 4
 #define VIXDISKLIB_SECTOR_SIZE 512
 
+#define VIXDISKLIB_MIN_CHUNK_SIZE 128
+#define VIXDISKLIB_MAX_CHUNK_NUMBER (512*1024)
+
 typedef void *VixDiskLibConnection;
 typedef void *VixDiskLibHandle;
 
@@ -124,4 +127,14 @@ typedef struct VixDiskLibInfo {
   char *uuid;
 } VixDiskLibInfo;
 
+typedef struct {
+  uint64_t offset;
+  uint64_t length;
+} VixDiskLibBlock;
+
+typedef struct {
+  uint32_t numBlocks;
+  VixDiskLibBlock blocks[1];
+} VixDiskLibBlockList;
+
 #endif /* NBDKIT_VDDK_STRUCTS_H */
diff --git a/plugins/vddk/vddk.c b/plugins/vddk/vddk.c
index 56871cc..436aa4d 100644
--- a/plugins/vddk/vddk.c
+++ b/plugins/vddk/vddk.c
@@ -45,6 +45,8 @@
 #include <nbdkit-plugin.h>
 
 #include "isaligned.h"
+#include "minmax.h"
+#include "rounding.h"
 
 #include "vddk-structs.h"
 
@@ -67,6 +69,8 @@ static VixError (*VixDiskLib_GetInfo) (VixDiskLibHandle
handle, VixDiskLibInfo *
 static void (*VixDiskLib_FreeInfo) (VixDiskLibInfo *info);
 static VixError (*VixDiskLib_Read) (VixDiskLibHandle handle, uint64_t
start_sector, uint64_t nr_sectors, unsigned char *buf);
 static VixError (*VixDiskLib_Write) (VixDiskLibHandle handle, uint64_t
start_sector, uint64_t nr_sectors, const unsigned char *buf);
+static VixError (*VixDiskLib_QueryAllocatedBlocks) (VixDiskLibHandle
diskHandle, uint64_t start_sector, uint64_t nr_sectors, uint64_t chunk_size,
VixDiskLibBlockList **block_list);
+static VixError (*VixDiskLib_FreeBlockList) (VixDiskLibBlockList *block_list);
 
 /* Parameters passed to InitEx. */
 #define VDDK_MAJOR 5
@@ -174,6 +178,11 @@ vddk_load (void)
   VixDiskLib_FreeInfo = dlsym (dl, "VixDiskLib_FreeInfo");
   VixDiskLib_Read = dlsym (dl, "VixDiskLib_Read");
   VixDiskLib_Write = dlsym (dl, "VixDiskLib_Write");
+
+  /* Added in VDDK 6.7, these will be NULL for earlier versions: */
+  VixDiskLib_QueryAllocatedBlocks +    dlsym (dl,
"VixDiskLib_QueryAllocatedBlocks");
+  VixDiskLib_FreeBlockList = dlsym (dl, "VixDiskLib_FreeBlockList");
 }
 
 static void
@@ -570,6 +579,136 @@ vddk_pwrite (void *handle, const void *buf, uint32_t
count, uint64_t offset)
   return 0;
 }
 
+static int
+vddk_can_extents (void *handle)
+{
+  struct vddk_handle *h = handle;
+  VixError err;
+  VixDiskLibBlockList *block_list;
+
+  /* This call was added in VDDK 6.7.  In earlier versions the
+   * function pointer will be NULL and we cannot query extents.
+   */
+  if (VixDiskLib_QueryAllocatedBlocks == NULL) {
+    nbdkit_debug ("can_extents: VixDiskLib_QueryAllocatedBlocks == NULL,
"
+                  "probably this is VDDK < 6.7");
+    return 0;
+  }
+
+  /* However even when the call is available it rarely works well so
+   * the best thing we can do here is to try the call and if it's
+   * non-functional return false.
+   */
+  DEBUG_CALL ("VixDiskLib_QueryAllocatedBlocks",
+              "handle, 0, %d sectors, %d sectors",
+              VIXDISKLIB_MIN_CHUNK_SIZE, VIXDISKLIB_MIN_CHUNK_SIZE);
+  err = VixDiskLib_QueryAllocatedBlocks (h->handle,
+                                         0, VIXDISKLIB_MIN_CHUNK_SIZE,
+                                         VIXDISKLIB_MIN_CHUNK_SIZE,
+                                         &block_list);
+  if (err == VIX_OK) {
+    DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list");
+    VixDiskLib_FreeBlockList (block_list);
+  }
+  if (err != VIX_OK) {
+    char *errmsg = VixDiskLib_GetErrorText (err, NULL);
+    nbdkit_debug ("can_extents: VixDiskLib_QueryAllocatedBlocks test
failed, "
+                  "extents support will be disabled: "
+                  "original error: %s",
+                  errmsg);
+    VixDiskLib_FreeErrorText (errmsg);
+    return 0;
+  }
+
+  return 1;
+}
+
+static int
+vddk_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags,
+              struct nbdkit_extents *extents)
+{
+  struct vddk_handle *h = handle;
+  bool req_one = flags & NBDKIT_FLAG_REQ_ONE;
+  uint64_t position, end, start_sector;
+
+  position = offset;
+
+  /* We can only query whole chunks.  Therefore start with the first
+   * chunk before offset.
+   */
+  end = offset + count;
+  start_sector +    ROUND_DOWN (offset, VIXDISKLIB_MIN_CHUNK_SIZE *
VIXDISKLIB_SECTOR_SIZE)
+    / VIXDISKLIB_SECTOR_SIZE;
+  while (start_sector * VIXDISKLIB_SECTOR_SIZE < end) {
+    VixError err;
+    uint32_t i;
+    uint64_t nr_chunks, nr_sectors;
+    VixDiskLibBlockList *block_list;
+
+    assert (IS_ALIGNED (start_sector, VIXDISKLIB_MIN_CHUNK_SIZE));
+
+    nr_chunks +      ROUND_UP (end - start_sector * VIXDISKLIB_SECTOR_SIZE,
+                VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE)
+      / (VIXDISKLIB_MIN_CHUNK_SIZE * VIXDISKLIB_SECTOR_SIZE);
+    nr_chunks = MIN (nr_chunks, VIXDISKLIB_MAX_CHUNK_NUMBER);
+    nr_sectors = nr_chunks * VIXDISKLIB_MIN_CHUNK_SIZE;
+
+    DEBUG_CALL ("VixDiskLib_QueryAllocatedBlocks",
+                "handle, %" PRIu64 " sectors, %" PRIu64
" sectors, "
+                "%d sectors",
+                start_sector, nr_sectors, VIXDISKLIB_MIN_CHUNK_SIZE);
+    err = VixDiskLib_QueryAllocatedBlocks (h->handle,
+                                           start_sector, nr_sectors,
+                                           VIXDISKLIB_MIN_CHUNK_SIZE,
+                                           &block_list);
+    if (err != VIX_OK) {
+      VDDK_ERROR (err, "VixDiskLib_QueryAllocatedBlocks");
+      return -1;
+    }
+
+    for (i = 0; i < block_list->numBlocks; ++i) {
+      uint64_t offset, length;
+
+      offset = block_list->blocks[i].offset * VIXDISKLIB_SECTOR_SIZE;
+      length = block_list->blocks[i].length * VIXDISKLIB_SECTOR_SIZE;
+
+      /* The query returns blocks.  We must insert holes between the
+       * blocks as necessary.
+       */
+      if (position < offset) {
+        if (nbdkit_add_extent (extents,
+                               offset, length,
+                               NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO) == -1)
+          goto error_in_add;
+      }
+
+      if (nbdkit_add_extent (extents,
+                             offset, length, 0 /* allocated data */) == -1) {
+      error_in_add:
+        DEBUG_CALL ("VixDiskLib_FreeBlockList",
"block_list");
+        VixDiskLib_FreeBlockList (block_list);
+        return -1;
+      }
+
+      position = offset + length;
+
+      if (req_one)
+        break;
+    }
+    DEBUG_CALL ("VixDiskLib_FreeBlockList", "block_list");
+    VixDiskLib_FreeBlockList (block_list);
+
+    start_sector += nr_sectors;
+
+    if (req_one)
+      break;
+  }
+
+  return 0;
+}
+
 static struct nbdkit_plugin plugin = {
   .name              = "vddk",
   .longname          = "VMware VDDK plugin",
@@ -585,6 +724,8 @@ static struct nbdkit_plugin plugin = {
   .get_size          = vddk_get_size,
   .pread             = vddk_pread,
   .pwrite            = vddk_pwrite,
+  .can_extents       = vddk_can_extents,
+  .extents           = vddk_extents,
 };
 
 NBDKIT_REGISTER_PLUGIN(plugin)
-- 
2.20.1
Eric Blake
2019-Mar-27  06:56 UTC
Re: [Libguestfs] [PATCH nbdkit v4 01/15] server: Implement extents/can_extents calls for plugins and filters.
On 3/26/19 4:17 PM, Richard W.M. Jones wrote:> This pair of calls allows plugins to describe which extents in the > virtual disk are allocated, holes or zeroes. > ---> +=head2 C<.extents> > + > + int extents (void *handle, uint32_t count, uint64_t offset, > + uint32_t flags, struct nbdkit_extents *extents); > + > +During the data serving phase, this callback is used to detect > +allocated, sparse and zeroed regions of the disk. > + > +This function will not be called if C<.can_extents> returned false. > +nbdkit's default behaviour in this case is to treat the whole virtual > +disk as if it was allocated. > + > +The callback should detect and return the list of extents overlapping > +the range C<[offset...offset+count-1]>. The C<extents> parameter > +points to an opaque object which the callback should fill in by > +calling C<nbdkit_add_extent>. See L</Extents list> below.[1]> + > +If there is an error, C<.extents> should call C<nbdkit_error> with an > +error message, and C<nbdkit_set_error> to record an appropriate error > +(unless C<errno> is sufficient), then return C<-1>. > + > +=head3 Extents list > + > +The plugin C<extents> callback is passed an opaque pointer C<struct > +nbdkit_extents *extents>. This structure represents a list of > +L<filesystem extents|https://en.wikipedia.org/wiki/Extent_(file_systems)> > +describing which areas of the disk are allocated, which are sparse > +(“holes”), and, if supported, which are zeroes. > + > +The C<extents> callback should scan the disk starting at C<offset> and > +call C<nbdkit_add_extent> for each extent found. > + > +Extents overlapping the range C<[offset...offset+count-1]> should be > +returned if possible. However nbdkit ignores extents E<lt> offset so > +the plugin may, if it is easier to implement, return all extent > +information for the whole disk. The plugin may return extents beyond > +the end of the range. It may also return extent information for less > +than the whole range, but it must return at least one extent > +overlapping C<offset>. > + > +The extents B<must> be added in ascending order, and B<must> be > +contiguous. > + > +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> > +which means that the client is only requesting information about the > +extent overlapping C<offset>. The plugin may ignore this flag, or as > +an optimization it may return just a single extent for C<offset>.I wonder if this paragraph belongs up at [1]. Looks good to me. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-28  02:36 UTC
Re: [Libguestfs] [PATCH nbdkit v4 04/15] blocksize: Implement extents.
On 3/26/19 4:17 PM, Richard W.M. Jones wrote:> --- > filters/blocksize/nbdkit-blocksize-filter.pod | 4 ++-- > filters/blocksize/blocksize.c | 19 +++++++++++++++++++ > 2 files changed, 21 insertions(+), 2 deletions(-)ACK. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-28  02:45 UTC
Re: [Libguestfs] [PATCH nbdkit v4 05/15] cow: Disable extents information in this filter.
On 3/26/19 4:17 PM, Richard W.M. Jones wrote:> The cow filter doesn't support trimming and we should assume the > underlying plugin is fully allocated too. Note that both of these > limitations might be lifted with a more advanced filter > implementation. > > However we ought to support this in future because xz files contain > sparseness information, so add a note.Stale copy-and-paste comment? Also, since the series is mostly alphabetical, I suspect you're missing 4.5/15 for filters/cache/cache.c (blindly passing through .extents to the underlying plugin); easiest is probably forcing .can_extents false, but nicer would be remembering status of portions that we have cached locally to avoid repeated callouts to the plugin. ACK for the cow filter. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-28  02:47 UTC
Re: [Libguestfs] [PATCH nbdkit v4 06/15] delay: Allow block status (extents) requests to be separately delayed.
On 3/26/19 4:17 PM, Richard W.M. Jones wrote:> --- > filters/delay/nbdkit-delay-filter.pod | 8 ++++++++ > filters/delay/delay.c | 26 ++++++++++++++++++++++++++ > 2 files changed, 34 insertions(+) > > diff --git a/filters/delay/nbdkit-delay-filter.pod b/filters/delay/nbdkit-delay-filter.pod > index c2eb172..2e2ac74 100644 > --- a/filters/delay/nbdkit-delay-filter.pod > +++ b/filters/delay/nbdkit-delay-filter.pod > @@ -11,6 +11,7 @@ nbdkit-delay-filter - nbdkit delay filter > nbdkit --filter=delay plugin [plugin-args ...] > delay-read=(SECS|NNms) delay-write=(SECS|NNms) > delay-zero=(SECS|NNms) delay-trim=(SECS|NNms) > + delay-extents=(SECS|NNms) >ACK -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-28  12:06 UTC
Re: [Libguestfs] [PATCH nbdkit v4 07/15] error: Extend the error filter so it can inject errors into block status extents requests.
On 3/26/19 4:17 PM, Richard W.M. Jones wrote:> --- > filters/error/nbdkit-error-filter.pod | 11 ++++++-- > filters/error/error.c | 36 ++++++++++++++++++++++++--- > 2 files changed, 41 insertions(+), 6 deletions(-) >Copy and paste is fun :) ACK -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Mar-28  12:13 UTC
Re: [Libguestfs] [PATCH nbdkit v4 08/15] log: Log extents requests.
On 3/26/19 4:17 PM, Richard W.M. Jones wrote:> Typical output: > > 2019-03-26 18:22:17.287434 connection=2 Extents id=2 offset=0x0 count=0x40000000 req_one=1 ... > 2019-03-26 18:22:17.290040 connection=2 ...Extents id=2 extents=[{ offset=0x0, length=0x250000, type=0 }, { offset=0x250000, length=0x7db0000, type=3 }, { offset=0x8000000, length=0x8000, type=0 }, { offset=0x8008000, length=0x38000, type=3 }, { offset=0x8040000, length=0x208000, type=0 }, { offset=0x8248000, length=0x7db8000, type=3 }, { offset=0x10000000, length=0x208000, type=0 }, { offset=0x10208000, length=0x7df8000, type=3 }, { offset=0x18000000, length=0x8000, type=0 }, { offset=0x18008000, length=0x38000, type=3 }, { offset=0x18040000, length=0x208000, type=0 }, { offset=0x18248000, length=0x7db8000, type=3 }, { offset=0x20000000, length=0x208000, type=0 }, { offset=0x20208000, length=0x7df8000, type=3 }, { offset=0x28000000, length=0x8000, type=0 }, { offset=0x28008000, length=0x38000, type=3 }, { offset=0x28040000, length=0x208000, type=0 }, { offset=0x28248000, length=0x7db8000, type=3 }, { offset=0x30000000, length=0x208000, type=0 }, { offset=0x30208000, length=0x7df8000, type=3 }, { offset=0x38000000, length=0x8000, type=0 }, { offset=0x38008000, length=0x38000, type=3 }, { offset=0x38040000, length=0x208000, type=0 }, { offset=0x38248000, length=0x7da8000, type=3 }, { offset=0x3fff0000, length=0x10000, type=0 }] return=0 > > As you can see this is logging the complete information as generated > by the underlying plugin, not what is returned to the client (which is > rather hard for the log filter to discern). It's probably more useful > for debugging like this.Can get quite verbose, but that's par for a logger. If we wanted, we could introduce a log-abbreviate option, to produce something like: extents=[{ offset=0x0, length=0x250000, type=0 }, ...N more extents] where you get verbose by default, but can opt-in to the shorter output.> + else { > + FILE *fp; > + char *extents_str = NULL; > + size_t i, n, len = 0; > + > + fp = open_memstream (&extents_str, &len);Yeah, a memstream makes this easier (it is in POSIX, but not universally available; do we care?) Missing a patch to nbdkit-log-filter.pod:> This filter writes to the file specified by the C<logfile=FILE> > parameter. All lines include a timestamp, a connection counter, then > details about the command. The following actions are logged: Connect, > Read, Write, Zero, Trim, Flush, and Disconnect. Except for Connectwhere that sentence needs to be updated. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Reasonably Related Threads
- [PATCH nbdkit v5 FINAL 00/19] Implement extents.
- [PATCH nbdkit 0/8] Implement extents using a simpler array.
- [RFC nbdkit PATCH 0/3] aligned .extents
- [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.
- [nbdkit PATCH] offset, partition: Fix .extents with non-zero offset