Martin Kletzander
2019-Jun-11  08:49 UTC
[Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
This filter caches the last result of the extents() call and offers a nice
speed-up for clients that only support req_one=1 in combination with plugins
like vddk, which has no overhead for returning information for multiple extents
in one call, but that call is very time-consuming.
Quick test showed that on a fast connection and a sparsely allocated 16G disk
with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it
takes 5s to the first extents request).  For 100G disk with no data on it, that
is one hole extent spanning the whole disk (where there is no space for
improvement) this does not add any noticeable overhead.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
---
Notes:
    v2:
    
     - Clearing the REQ_ONE when forwarding the request to plugin
    
     - I hate the wording of the comment explaining it, please suggest a reword
(or
       just just fix it)
    
     - Tests are now using custom shell plugin logging all accesses and also
qemu-io
       testing all other expected behaviour
 configure.ac                                  |   2 +
 filters/cacheextents/Makefile.am              |  64 ++++++
 filters/cacheextents/cacheextents.c           | 196 ++++++++++++++++++
 .../nbdkit-cacheextents-filter.pod            |  47 +++++
 tests/Makefile.am                             |   4 +
 tests/test-cacheextents.sh                    | 110 ++++++++++
 6 files changed, 423 insertions(+)
 create mode 100644 filters/cacheextents/Makefile.am
 create mode 100644 filters/cacheextents/cacheextents.c
 create mode 100644 filters/cacheextents/nbdkit-cacheextents-filter.pod
 create mode 100755 tests/test-cacheextents.sh
diff --git a/configure.ac b/configure.ac
index 58031f3c6d86..8db5a4cac3c5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -825,6 +825,7 @@ plugins="$(echo $(printf %s\\n $lang_plugins
$non_lang_plugins | sort -u))"
 filters="\
         blocksize \
         cache \
+        cacheextents \
         cow \
         delay \
         error \
@@ -900,6 +901,7 @@ AC_CONFIG_FILES([Makefile
                  filters/Makefile
                  filters/blocksize/Makefile
                  filters/cache/Makefile
+                 filters/cacheextents/Makefile
                  filters/cow/Makefile
                  filters/delay/Makefile
                  filters/error/Makefile
diff --git a/filters/cacheextents/Makefile.am b/filters/cacheextents/Makefile.am
new file mode 100644
index 000000000000..9e5b59c09c59
--- /dev/null
+++ b/filters/cacheextents/Makefile.am
@@ -0,0 +1,64 @@
+# nbdkit
+# Copyright (C) 2019 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+include $(top_srcdir)/common-rules.mk
+
+EXTRA_DIST = nbdkit-cacheextents-filter.pod
+
+filter_LTLIBRARIES = nbdkit-cacheextents-filter.la
+
+nbdkit_cacheextents_filter_la_SOURCES = \
+	cacheextents.c \
+	$(top_srcdir)/include/nbdkit-filter.h
+
+nbdkit_cacheextents_filter_la_CPPFLAGS = \
+	-I$(top_srcdir)/include \
+	-I$(top_srcdir)/common/include \
+	-I$(top_srcdir)/common/utils
+nbdkit_cacheextents_filter_la_CFLAGS = \
+	$(WARNINGS_CFLAGS)
+nbdkit_cacheextents_filter_la_LDFLAGS = \
+	-module -avoid-version -shared \
+	-Wl,--version-script=$(top_srcdir)/filters/filters.syms
+nbdkit_cacheextents_filter_la_LIBADD = \
+	$(top_builddir)/common/utils/libutils.la
+
+if HAVE_POD
+
+man_MANS = nbdkit-cacheextents-filter.1
+CLEANFILES += $(man_MANS)
+
+nbdkit-cacheextents-filter.1: nbdkit-cacheextents-filter.pod
+	$(PODWRAPPER) --section=1 --man $@ \
+	    --html $(top_builddir)/html/$@.html \
+	    $<
+
+endif HAVE_POD
diff --git a/filters/cacheextents/cacheextents.c
b/filters/cacheextents/cacheextents.c
new file mode 100644
index 000000000000..57c29a8a8c6f
--- /dev/null
+++ b/filters/cacheextents/cacheextents.c
@@ -0,0 +1,196 @@
+/* nbdkit
+ * Copyright (C) 2019 Red Hat Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <errno.h>
+#include <inttypes.h>
+
+#include <pthread.h>
+
+#include <nbdkit-filter.h>
+
+#include "cleanup.h"
+
+#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
+
+/* This lock protects the global state. */
+static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+
+/* Cached extents from the last extents () call and its start and end for the
+   sake of simplicity. */
+struct nbdkit_extents *cache_extents;
+static uint64_t cache_start;
+static uint64_t cache_end;
+
+static void
+cacheextents_unload (void)
+{
+  nbdkit_extents_free (cache_extents);
+}
+
+static int
+cacheextents_add (struct nbdkit_extents *extents, int *err)
+{
+  size_t i = 0;
+
+  for (i = 0; i < nbdkit_extents_count (cache_extents); i++) {
+    struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i);
+    if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) {
+      *err = errno;
+      return -1;
+    }
+  }
+
+  return 0;
+}
+
+static int
+cacheextents_fill (struct nbdkit_extents *extents, int *err)
+{
+  size_t i = 0;
+  size_t count = nbdkit_extents_count (extents);
+  struct nbdkit_extent first = nbdkit_get_extent (extents, 0);
+  struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1);
+
+  nbdkit_extents_free (cache_extents);
+  cache_start = first.offset;
+  cache_end = last.offset + last.length;
+  cache_extents = nbdkit_extents_new (cache_start, cache_end);
+
+  for (i = 0; i < count; i++) {
+    struct nbdkit_extent ex = nbdkit_get_extent (extents, i);
+    nbdkit_debug ("cacheextents: updating cache with"
+                  ": offset=%" PRIu64
+                  ": length=%" PRIu64
+                  "; type=%x",
+                  ex.offset, ex.length, ex.type);
+    if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1)
{
+      *err = errno;
+      return -1;
+    }
+  }
+
+  return 0;
+}
+
+static int
+cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+                      void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
+                      struct nbdkit_extents *extents,
+                      int *err)
+{
+  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+
+  nbdkit_debug ("cacheextents:"
+                " cache_start=%" PRIu64
+                " cache_end=%" PRIu64
+                " cache_extents=%p",
+                cache_start, cache_end, cache_extents);
+
+  if (cache_extents &&
+      offset >= cache_start && offset < cache_end) {
+    nbdkit_debug ("cacheextents: returning from cache");
+    return cacheextents_add (extents, err);
+  }
+
+  nbdkit_debug ("cacheextents: cache miss");
+  /* Clearing the REQ_ONE here in order to get (possibly) more data from the
+   * plugin.  This should not increase the cost as plugins can still return
just
+   * one extent if it's too costly to return more. */
+  flags &= ~(NBDKIT_FLAG_REQ_ONE);
+  if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1)
+    return -1;
+
+  return cacheextents_fill (extents, err);
+}
+
+/* Any changes to the data need to clean the cache.
+ *
+ * Similarly to readahead filter this could be more intelligent, but there
would
+ * be very little benefit.
+ */
+
+static void
+kill_cacheextents (void)
+{
+  ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
+  nbdkit_extents_free (cache_extents);
+  cache_extents = NULL;
+}
+
+static int
+cacheextents_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
+                     void *handle,
+                     const void *buf, uint32_t count, uint64_t offset,
+                     uint32_t flags, int *err)
+{
+  kill_cacheextents ();
+  return next_ops->pwrite (nxdata, buf, count, offset, flags, err);
+}
+
+static int
+cacheextents_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
+                   void *handle,
+                   uint32_t count, uint64_t offset, uint32_t flags,
+                   int *err)
+{
+  kill_cacheextents ();
+  return next_ops->trim (nxdata, count, offset, flags, err);
+}
+
+static int
+cacheextents_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
+                   void *handle,
+                   uint32_t count, uint64_t offset, uint32_t flags,
+                   int *err)
+{
+  kill_cacheextents ();
+  return next_ops->zero (nxdata, count, offset, flags, err);
+}
+
+static struct nbdkit_filter filter = {
+  .name              = "cacheextents",
+  .longname          = "nbdkit cacheextents filter",
+  .version           = PACKAGE_VERSION,
+  .unload            = cacheextents_unload,
+  .pwrite            = cacheextents_pwrite,
+  .trim              = cacheextents_trim,
+  .zero              = cacheextents_zero,
+  .extents           = cacheextents_extents,
+};
+
+NBDKIT_REGISTER_FILTER (filter)
diff --git a/filters/cacheextents/nbdkit-cacheextents-filter.pod
b/filters/cacheextents/nbdkit-cacheextents-filter.pod
new file mode 100644
index 000000000000..88f3173b9375
--- /dev/null
+++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod
@@ -0,0 +1,47 @@
+=head1 NAME
+
+nbdkit-cacheextents-filter - cache extents
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=cacheextents plugin
+
+=head1 DESCRIPTION
+
+C<nbdkit-cacheextents-filter> is a filter that caches the result of last
+extents() call.
+
+A common use for this filter is to accelerate returning extents data for
+clients which ask for extents with the REQ_ONE flag set (like
+S<C<qemu-img convert>>) and is especially useful in combination
with
+plugins that report multiple extents in one call, but with high latency
+for each of those calls (like L<nbdkit-vddk-plugin(1)>).  For example:
+
+ nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ...
+
+For files with big extents (when it is unlikely for one extents() call
+to return multiple different extents) this does not slow down the
+access.
+
+=head1 PARAMETERS
+
+There are no parameters specific to nbdkit-cacheextents-filter.  Any
+parameters are passed through to and processed by the underlying
+plugin in the normal way.
+
+=head1 SEE ALSO
+
+L<nbdkit(1)>,
+L<nbdkit-cache-filter(1)>,
+L<nbdkit-readahead-filter(1)>,
+L<nbdkit-vddk-plugin(1)>,
+L<nbdkit-filter(3)>,
+L<qemu-img(1)>.
+
+=head1 AUTHORS
+
+Martin Kletzander
+
+=head1 COPYRIGHT
+
+Copyright (C) 2019 Red Hat Inc.
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 4148793c7297..aaf74502783f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -47,6 +47,7 @@ EXTRA_DIST = \
 	test-cache.sh \
 	test-cache-max-size.sh \
 	test-cache-on-read.sh \
+	test-cacheextents.sh \
 	test-captive.sh \
 	test-cow.sh \
 	test-cxx.sh \
@@ -865,6 +866,9 @@ TESTS += \
 endif HAVE_GUESTFISH
 TESTS += test-cache-max-size.sh
 
+# cacheextents filter test.
+TESTS += test-cacheextents.sh
+
 # cow filter test.
 if HAVE_GUESTFISH
 TESTS += test-cow.sh
diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh
new file mode 100755
index 000000000000..5b0667ad2b95
--- /dev/null
+++ b/tests/test-cacheextents.sh
@@ -0,0 +1,110 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2019 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+source ./functions.sh
+set -x
+set -e
+
+requires grep --version
+requires qemu-img --version
+requires qemu-io --version
+
+sock="$(mktemp -u)"
+sockurl="nbd+unix:///?socket=$sock"
+pidfile="test-cacheextents.pid"
+accessfile="test-cacheextents-access.log"
+accessfile_full="$(pwd)/test-cacheextents-access.log"
+files="$pidfile $sock"
+rm -f $files $accessfile
+cleanup_fn rm -f $files
+
+# Intentionally using EOF rather than 'EOF' so that we can pass in the
+# $accessfile_full
+start_nbdkit \
+    -P $pidfile \
+    -U $sock \
+    --filter=cacheextents \
+    sh - <<EOF
+echo "Call: \$@" >>$accessfile_full
+size=4M
+block_size=\$((1024*1024))
+case "\$1" in
+  get_size) echo \$size ;;
+  can_extents) ;;
+  extents)
+    echo "extents request: \$@" >>$accessfile_full
+    offset=\$((\$4/\$block_size))
+    count=\$((\$3/\$block_size))
+    length=\$((\$offset+\$count))
+    for i in \$(seq \$offset \$length); do
+      echo \${i}M \$block_size \$((i%4)) >>$accessfile_full
+      echo \${i}M \$block_size \$((i%4))
+    done
+    ;;
+  pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;;
+  can_write) ;;
+  pwrite) dd of=/dev/null ;;
+  can_trim) ;;
+  trim) ;;
+  can_zero) ;;
+  trim) ;;
+  *) exit 2 ;;
+esac
+EOF
+
+
+test_me() {
+    num_accesses=$1
+    shift
+
+    qemu-io -f raw "$@" "$sockurl"
+    test "$(grep -c "^extents request: " $accessfile)" -eq
"$num_accesses"
+    ret=$?
+    rm -f "$accessfile"
+    return $ret
+}
+
+# First one causes caching, the rest should be returned from cache.
+test_me 1 -c 'map' -c 'map' -c 'map'
+# First one is still cached from last time, discard should kill the cache, then
+# one request should go through.
+test_me 1 -c 'map' -c 'discard 0 1' -c 'map'
+# Same as above, only this time the cache is killed before all the operations
as
+# well.  This is used from now on to clear the cache as it seems nicer and
+# faster than running new nbdkit for each test.
+test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c
'map'
+# Write should kill the cache as well.
+test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c
'map'
+# Alloc should use cached data from map
+test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0'
+# Read should not kill the cache
+test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c
'map' -c 'alloc 0'
-- 
2.21.0
Eric Blake
2019-Jun-11  13:49 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On 6/11/19 3:49 AM, Martin Kletzander wrote:> This filter caches the last result of the extents() call and offers a nice > speed-up for clients that only support req_one=1 in combination with plugins > like vddk, which has no overhead for returning information for multiple extents > in one call, but that call is very time-consuming. > > Quick test showed that on a fast connection and a sparsely allocated 16G disk > with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it > takes 5s to the first extents request). For 100G disk with no data on it, that > is one hole extent spanning the whole disk (where there is no space for > improvement) this does not add any noticeable overhead. > > Signed-off-by: Martin Kletzander <mkletzan@redhat.com> > --- > > Notes: > v2: > > - Clearing the REQ_ONE when forwarding the request to plugin > > - I hate the wording of the comment explaining it, please suggest a reword (or > just just fix it)I'll give it a shot.> > - Tests are now using custom shell plugin logging all accesses and also qemu-io > testing all other expected behaviour >> +++ b/filters/cacheextents/cacheextents.c> + > +/* Cached extents from the last extents () call and its start and end for the > + sake of simplicity. */The prevailing style seems to be the use of leading ' *' on subsequent comment lines, and */ on its own line.> +struct nbdkit_extents *cache_extents; > +static uint64_t cache_start; > +static uint64_t cache_end; > + > +static void > +cacheextents_unload (void) > +{ > + nbdkit_extents_free (cache_extents); > +} > + > +static int > +cacheextents_add (struct nbdkit_extents *extents, int *err) > +{ > + size_t i = 0; > + > + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) { > + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i); > + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) { > + *err = errno; > + return -1; > + } > + } > + > + return 0; > +} > + > +static int > +cacheextents_fill (struct nbdkit_extents *extents, int *err) > +{ > + size_t i = 0; > + size_t count = nbdkit_extents_count (extents); > + struct nbdkit_extent first = nbdkit_get_extent (extents, 0); > + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1); > + > + nbdkit_extents_free (cache_extents); > + cache_start = first.offset; > + cache_end = last.offset + last.length; > + cache_extents = nbdkit_extents_new (cache_start, cache_end);Needs a check for failure. Hmm - my earlier comments about appending to the cache are hard if we cap it to cache_end; I may still play with tweaking the cache algorithm to allow appending, but it would be a separate patch on top.> + > + for (i = 0; i < count; i++) { > + struct nbdkit_extent ex = nbdkit_get_extent (extents, i); > + nbdkit_debug ("cacheextents: updating cache with" > + ": offset=%" PRIu64 > + ": length=%" PRIu64 > + "; type=%x",Punctuation looks a bit funny; I'll probably normalize to updating cache with: offset=512 length=512 type=0 to match...> + ex.offset, ex.length, ex.type); > + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) { > + *err = errno; > + return -1; > + } > + } > + > + return 0; > +} > + > +static int > +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle, uint32_t count, uint64_t offset, uint32_t flags, > + struct nbdkit_extents *extents, > + int *err) > +{ > + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); > + > + nbdkit_debug ("cacheextents:" > + " cache_start=%" PRIu64 > + " cache_end=%" PRIu64 > + " cache_extents=%p",...this style.> + cache_start, cache_end, cache_extents); > + > + if (cache_extents && > + offset >= cache_start && offset < cache_end) { > + nbdkit_debug ("cacheextents: returning from cache"); > + return cacheextents_add (extents, err); > + } > + > + nbdkit_debug ("cacheextents: cache miss"); > + /* Clearing the REQ_ONE here in order to get (possibly) more data from the > + * plugin. This should not increase the cost as plugins can still return just > + * one extent if it's too costly to return more. */ > + flags &= ~(NBDKIT_FLAG_REQ_ONE); > + if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1) > + return -1; > + > + return cacheextents_fill (extents, err); > +} > + > +/* Any changes to the data need to clean the cache. > + * > + * Similarly to readahead filter this could be more intelligent, but there would > + * be very little benefit. > + */ > + > +static void > +kill_cacheextents (void)> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod > @@ -0,0 +1,47 @@ > +=head1 NAME > + > +nbdkit-cacheextents-filter - cache extents > + > +=head1 SYNOPSIS > + > + nbdkit --filter=cacheextents plugin > + > +=head1 DESCRIPTION > + > +C<nbdkit-cacheextents-filter> is a filter that caches the result of last > +extents() call. > + > +A common use for this filter is to accelerate returning extents data for > +clients which ask for extents with the REQ_ONE flag set (like > +S<C<qemu-img convert>>) and is especially useful in combination with > +plugins that report multiple extents in one call, but with high latency > +for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example:You asked for a re-word; how about this (with proper markup): A common use for this filter is to improve performance when using a client performing a linear pass over the entire image while asking for only one extent at a time (such as qemu-img convert), but where the plugin can provide multiple extents for the same high latency as a single extent (such as nbdkit-vddk-plugin).> + > + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ... > + > +For files with big extents (when it is unlikely for one extents() call > +to return multiple different extents) this does not slow down the > +access.I'd also like to see a mention of nbdkit-cache-filter, probably worded as: Note that this caches only plugin metadata; this filter can be combined with L<nbdkit-cache-filter(1)> to also cache data. and a parallel sentence added to nbdkit-cache-filter.pod.> diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh > new file mode 100755 > index 000000000000..5b0667ad2b95 > --- /dev/null > +++ b/tests/test-cacheextents.sh > @@ -0,0 +1,110 @@> +requires grep --version > +requires qemu-img --version > +requires qemu-io --version > + > +sock="$(mktemp -u)" > +sockurl="nbd+unix:///?socket=$sock" > +pidfile="test-cacheextents.pid" > +accessfile="test-cacheextents-access.log" > +accessfile_full="$(pwd)/test-cacheextents-access.log"$PWD saves a process compared to $(pwd)> +files="$pidfile $sock" > +rm -f $files $accessfile > +cleanup_fn rm -f $files > + > +# Intentionally using EOF rather than 'EOF' so that we can pass in the > +# $accessfile_full > +start_nbdkit \ > + -P $pidfile \ > + -U $sock \ > + --filter=cacheextents \ > + sh - <<EOF > +echo "Call: \$@" >>$accessfile_full > +size=4M > +block_size=\$((1024*1024)) > +case "\$1" in > + get_size) echo \$size ;; > + can_extents) ;; > + extents) > + echo "extents request: \$@" >>$accessfile_full > + offset=\$((\$4/\$block_size)) > + count=\$((\$3/\$block_size)) > + length=\$((\$offset+\$count))Adding spaces around the operators would make this more legible.> + for i in \$(seq \$offset \$length); do > + echo \${i}M \$block_size \$((i%4)) >>$accessfile_full > + echo \${i}M \$block_size \$((i%4)) > + done > + ;; > + pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;; > + can_write) ;; > + pwrite) dd of=/dev/null ;; > + can_trim) ;; > + trim) ;; > + can_zero) ;; > + trim) ;;zero)> + *) exit 2 ;; > +esac > +EOF > + > + > +test_me() { > + num_accesses=$1 > + shift > + > + qemu-io -f raw "$@" "$sockurl" > + test "$(grep -c "^extents request: " $accessfile)" -eq "$num_accesses" > + ret=$? > + rm -f "$accessfile" > + return $ret > +} > + > +# First one causes caching, the rest should be returned from cache. > +test_me 1 -c 'map' -c 'map' -c 'map' > +# First one is still cached from last time, discard should kill the cache, then > +# one request should go through. > +test_me 1 -c 'map' -c 'discard 0 1' -c 'map' > +# Same as above, only this time the cache is killed before all the operations as > +# well. This is used from now on to clear the cache as it seems nicer and > +# faster than running new nbdkit for each test. > +test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c 'map' > +# Write should kill the cache as well. > +test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c 'map' > +# Alloc should use cached data from map > +test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0' > +# Read should not kill the cache > +test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c 'map' -c 'alloc 0' >Looks good. Unless you have objections, I'll go ahead and make the tweaks mentioned, and push this. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Martin Kletzander
2019-Jun-11  14:31 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On Tue, Jun 11, 2019 at 08:49:37AM -0500, Eric Blake wrote:>On 6/11/19 3:49 AM, Martin Kletzander wrote: >> This filter caches the last result of the extents() call and offers a nice >> speed-up for clients that only support req_one=1 in combination with plugins >> like vddk, which has no overhead for returning information for multiple extents >> in one call, but that call is very time-consuming. >> >> Quick test showed that on a fast connection and a sparsely allocated 16G disk >> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it >> takes 5s to the first extents request). For 100G disk with no data on it, that >> is one hole extent spanning the whole disk (where there is no space for >> improvement) this does not add any noticeable overhead. >> >> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> >> --- >> >> Notes: >> v2: >> >> - Clearing the REQ_ONE when forwarding the request to plugin >> >> - I hate the wording of the comment explaining it, please suggest a reword (or >> just just fix it) > >I'll give it a shot. > >> >> - Tests are now using custom shell plugin logging all accesses and also qemu-io >> testing all other expected behaviour >> > >> +++ b/filters/cacheextents/cacheextents.c > >> + >> +/* Cached extents from the last extents () call and its start and end for the >> + sake of simplicity. */ > >The prevailing style seems to be the use of leading ' *' on subsequent >comment lines, and */ on its own line. > >> +struct nbdkit_extents *cache_extents; >> +static uint64_t cache_start; >> +static uint64_t cache_end; >> + >> +static void >> +cacheextents_unload (void) >> +{ >> + nbdkit_extents_free (cache_extents); >> +} >> + >> +static int >> +cacheextents_add (struct nbdkit_extents *extents, int *err) >> +{ >> + size_t i = 0; >> + >> + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) { >> + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i); >> + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) == -1) { >> + *err = errno; >> + return -1; >> + } >> + } >> + >> + return 0; >> +} >> + >> +static int >> +cacheextents_fill (struct nbdkit_extents *extents, int *err) >> +{ >> + size_t i = 0; >> + size_t count = nbdkit_extents_count (extents); >> + struct nbdkit_extent first = nbdkit_get_extent (extents, 0); >> + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1); >> + >> + nbdkit_extents_free (cache_extents); >> + cache_start = first.offset; >> + cache_end = last.offset + last.length; >> + cache_extents = nbdkit_extents_new (cache_start, cache_end); > >Needs a check for failure. > >Hmm - my earlier comments about appending to the cache are hard if we >cap it to cache_end; I may still play with tweaking the cache algorithm >to allow appending, but it would be a separate patch on top. >Well, it just means that if this is about to be rewritten, then it cannot use the nbd_extents struct provided and instead an internal one (that would allow growing in both directions, maybe also being non-contiguous) would be used here.>> + >> + for (i = 0; i < count; i++) { >> + struct nbdkit_extent ex = nbdkit_get_extent (extents, i); >> + nbdkit_debug ("cacheextents: updating cache with" >> + ": offset=%" PRIu64 >> + ": length=%" PRIu64 >> + "; type=%x", > >Punctuation looks a bit funny; I'll probably normalize to >Really weird leftover, thanks for noticing>updating cache with: offset=512 length=512 type=0 > >to match... > >> + ex.offset, ex.length, ex.type); >> + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) { >> + *err = errno; >> + return -1; >> + } >> + } >> + >> + return 0; >> +} >> + >> +static int >> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata, >> + void *handle, uint32_t count, uint64_t offset, uint32_t flags, >> + struct nbdkit_extents *extents, >> + int *err) >> +{ >> + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); >> + >> + nbdkit_debug ("cacheextents:" >> + " cache_start=%" PRIu64 >> + " cache_end=%" PRIu64 >> + " cache_extents=%p", > >...this style. > >> + cache_start, cache_end, cache_extents); >> + >> + if (cache_extents && >> + offset >= cache_start && offset < cache_end) { >> + nbdkit_debug ("cacheextents: returning from cache"); >> + return cacheextents_add (extents, err); >> + } >> + >> + nbdkit_debug ("cacheextents: cache miss"); >> + /* Clearing the REQ_ONE here in order to get (possibly) more data from the >> + * plugin. This should not increase the cost as plugins can still return just >> + * one extent if it's too costly to return more. */ >> + flags &= ~(NBDKIT_FLAG_REQ_ONE); >> + if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1) >> + return -1; >> + >> + return cacheextents_fill (extents, err); >> +} >> + >> +/* Any changes to the data need to clean the cache. >> + * >> + * Similarly to readahead filter this could be more intelligent, but there would >> + * be very little benefit. >> + */ >> + >> +static void >> +kill_cacheextents (void) > >> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod >> @@ -0,0 +1,47 @@ >> +=head1 NAME >> + >> +nbdkit-cacheextents-filter - cache extents >> + >> +=head1 SYNOPSIS >> + >> + nbdkit --filter=cacheextents plugin >> + >> +=head1 DESCRIPTION >> + >> +C<nbdkit-cacheextents-filter> is a filter that caches the result of last >> +extents() call. >> + >> +A common use for this filter is to accelerate returning extents data for >> +clients which ask for extents with the REQ_ONE flag set (like >> +S<C<qemu-img convert>>) and is especially useful in combination with >> +plugins that report multiple extents in one call, but with high latency >> +for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example: > >You asked for a re-word; how about this (with proper markup): > >A common use for this filter is to improve performance when using a >client performing a linear pass over the entire image while asking for >only one extent at a time (such as qemu-img convert), but where the >plugin can provide multiple extents for the same high latency as a >single extent (such as nbdkit-vddk-plugin). >Yes, this reads better (although the one I really wanted to reword was the comment in cacheextents_extents(): + /* Clearing the REQ_ONE here in order to get (possibly) more data from the + * plugin. This should not increase the cost as plugins can still return just + * one extent if it's too costly to return more. */ + flags &= ~(NBDKIT_FLAG_REQ_ONE); ...>> + >> + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ... >> + >> +For files with big extents (when it is unlikely for one extents() call >> +to return multiple different extents) this does not slow down the >> +access. > >I'd also like to see a mention of nbdkit-cache-filter, probably worded as: > >Note that this caches only plugin metadata; this filter can be combined >with L<nbdkit-cache-filter(1)> to also cache data. > >and a parallel sentence added to nbdkit-cache-filter.pod. >and nbdkit-filters, yes, I forgot to do this even though you already pointed that out in the previous version. I just got to this after too long.> >> diff --git a/tests/test-cacheextents.sh b/tests/test-cacheextents.sh >> new file mode 100755 >> index 000000000000..5b0667ad2b95 >> --- /dev/null >> +++ b/tests/test-cacheextents.sh >> @@ -0,0 +1,110 @@ > >> +requires grep --version >> +requires qemu-img --version >> +requires qemu-io --version >> + >> +sock="$(mktemp -u)" >> +sockurl="nbd+unix:///?socket=$sock" >> +pidfile="test-cacheextents.pid" >> +accessfile="test-cacheextents-access.log" >> +accessfile_full="$(pwd)/test-cacheextents-access.log" > >$PWD saves a process compared to $(pwd) > >> +files="$pidfile $sock" >> +rm -f $files $accessfile >> +cleanup_fn rm -f $files >> + >> +# Intentionally using EOF rather than 'EOF' so that we can pass in the >> +# $accessfile_full >> +start_nbdkit \ >> + -P $pidfile \ >> + -U $sock \ >> + --filter=cacheextents \ >> + sh - <<EOF >> +echo "Call: \$@" >>$accessfile_full >> +size=4M >> +block_size=\$((1024*1024)) >> +case "\$1" in >> + get_size) echo \$size ;; >> + can_extents) ;; >> + extents) >> + echo "extents request: \$@" >>$accessfile_full >> + offset=\$((\$4/\$block_size)) >> + count=\$((\$3/\$block_size)) >> + length=\$((\$offset+\$count)) > >Adding spaces around the operators would make this more legible. > >> + for i in \$(seq \$offset \$length); do >> + echo \${i}M \$block_size \$((i%4)) >>$accessfile_full >> + echo \${i}M \$block_size \$((i%4)) >> + done >> + ;; >> + pread) dd if=/dev/zero count=\$3 iflag=count_bytes ;; >> + can_write) ;; >> + pwrite) dd of=/dev/null ;; >> + can_trim) ;; >> + trim) ;; >> + can_zero) ;; >> + trim) ;; > >zero) > >> + *) exit 2 ;; >> +esac >> +EOF >> + >> + >> +test_me() { >> + num_accesses=$1 >> + shift >> + >> + qemu-io -f raw "$@" "$sockurl" >> + test "$(grep -c "^extents request: " $accessfile)" -eq "$num_accesses" >> + ret=$? >> + rm -f "$accessfile" >> + return $ret >> +} >> + >> +# First one causes caching, the rest should be returned from cache. >> +test_me 1 -c 'map' -c 'map' -c 'map' >> +# First one is still cached from last time, discard should kill the cache, then >> +# one request should go through. >> +test_me 1 -c 'map' -c 'discard 0 1' -c 'map' >> +# Same as above, only this time the cache is killed before all the operations as >> +# well. This is used from now on to clear the cache as it seems nicer and >> +# faster than running new nbdkit for each test. >> +test_me 2 -c 'discard 0 1' -c 'map' -c 'discard 0 1' -c 'map' >> +# Write should kill the cache as well. >> +test_me 2 -c 'discard 0 1' -c 'map' -c 'write 0 1' -c 'map' >> +# Alloc should use cached data from map >> +test_me 1 -c 'discard 0 1' -c 'map' -c 'alloc 0' >> +# Read should not kill the cache >> +test_me 1 -c 'discard 0 1' -c 'map' -c 'read 0 1' -c 'map' -c 'alloc 0' >> > >Looks good. Unless you have objections, I'll go ahead and make the >tweaks mentioned, and push this. >Yes, all of them make sense, thank you very much.>-- >Eric Blake, Principal Software Engineer >Red Hat, Inc. +1-919-301-3226 >Virtualization: qemu.org | libvirt.org >
Eric Blake
2019-Jun-11  16:35 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On 6/11/19 3:49 AM, Martin Kletzander wrote:> This filter caches the last result of the extents() call and offers a nice > speed-up for clients that only support req_one=1 in combination with plugins > like vddk, which has no overhead for returning information for multiple extents > in one call, but that call is very time-consuming. > > Quick test showed that on a fast connection and a sparsely allocated 16G disk > with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it > takes 5s to the first extents request). For 100G disk with no data on it, that > is one hole extent spanning the whole disk (where there is no space for > improvement) this does not add any noticeable overhead. > > Signed-off-by: Martin Kletzander <mkletzan@redhat.com> > --- >> +static int > +cacheextents_fill (struct nbdkit_extents *extents, int *err) > +{ > + size_t i = 0; > + size_t count = nbdkit_extents_count (extents); > + struct nbdkit_extent first = nbdkit_get_extent (extents, 0); > + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1); > + > + nbdkit_extents_free (cache_extents); > + cache_start = first.offset; > + cache_end = last.offset + last.length; > + cache_extents = nbdkit_extents_new (cache_start, cache_end); > + > + for (i = 0; i < count; i++) { > + struct nbdkit_extent ex = nbdkit_get_extent (extents, i); > + nbdkit_debug ("cacheextents: updating cache with" > + ": offset=%" PRIu64 > + ": length=%" PRIu64 > + "; type=%x", > + ex.offset, ex.length, ex.type); > + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) == -1) { > + *err = errno; > + return -1; > + }We need to clear cache_extents on this error path (otherwise cache_end is wrong, and future extents requests may try to read from the cache but fail, when they should instead be treated as a cache miss). Sorry I missed it during my initial review; I'll push the patch shortly. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jun-11  19:42 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On 6/11/19 3:49 AM, Martin Kletzander wrote:> This filter caches the last result of the extents() call and offers a nice > speed-up for clients that only support req_one=1 in combination with plugins > like vddk, which has no overhead for returning information for multiple extents > in one call, but that call is very time-consuming. > > Quick test showed that on a fast connection and a sparsely allocated 16G disk > with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it > takes 5s to the first extents request). For 100G disk with no data on it, that > is one hole extent spanning the whole disk (where there is no space for > improvement) this does not add any noticeable overhead. > > Signed-off-by: Martin Kletzander <mkletzan@redhat.com> > --- >> + > +/* This lock protects the global state. */ > +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; > + > +/* Cached extents from the last extents () call and its start and end for the > + sake of simplicity. */ > +struct nbdkit_extents *cache_extents; > +static uint64_t cache_start; > +static uint64_t cache_end;This shares a common cache across multiple connections. If the plugin ever returns connection-dependent differences (such as different sizes according to when the connection was made), we may have to change this to track a cache per connection. But then again, this is not the only filter to have that issue (at least the cache filter has the same issue of assuming that it can share a single cache among all connections). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Martin Kletzander
2019-Jun-12  21:43 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On Tue, Jun 11, 2019 at 02:42:29PM -0500, Eric Blake wrote:>On 6/11/19 3:49 AM, Martin Kletzander wrote: >> This filter caches the last result of the extents() call and offers a nice >> speed-up for clients that only support req_one=1 in combination with plugins >> like vddk, which has no overhead for returning information for multiple extents >> in one call, but that call is very time-consuming. >> >> Quick test showed that on a fast connection and a sparsely allocated 16G disk >> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it >> takes 5s to the first extents request). For 100G disk with no data on it, that >> is one hole extent spanning the whole disk (where there is no space for >> improvement) this does not add any noticeable overhead. >> >> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> >> --- >> > >> + >> +/* This lock protects the global state. */ >> +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; >> + >> +/* Cached extents from the last extents () call and its start and end for the >> + sake of simplicity. */ >> +struct nbdkit_extents *cache_extents; >> +static uint64_t cache_start; >> +static uint64_t cache_end; > >This shares a common cache across multiple connections. If the plugin >ever returns connection-dependent differences (such as different sizes >according to when the connection was made), we may have to change this >to track a cache per connection. But then again, this is not the only >filter to have that issue (at least the cache filter has the same issue >of assuming that it can share a single cache among all connections). >Well, I specifically changed this so that it doesn't cache the size of the image. In this case the cache would be invalid if someone modified the image locally without the use of nbdkit. That is something you cannot avoid. With a per-connection cache it would be invalid if any other connection did a write/discard/zero as it would not update the caches for other connections. So I think this is actually a better approach in the end.>-- >Eric Blake, Principal Software Engineer >Red Hat, Inc. +1-919-301-3226 >Virtualization: qemu.org | libvirt.org >