Martin Kletzander
2019-May-15 12:54 UTC
[Libguestfs] [nbdkit PATCH] Introduce cacheextents filter
This filter caches the last result of the extents() call and offers a nice speed-up for clients that only support req_on=1 in combination with plugins like vddk, which has no overhead for returning information for multiple extents in one call, but that call is very time-consuming. Quick test showed that on a fast connection and a sparsely allocated 16G disk with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it takes 5s to the first extents request). For 100G disk with no data on it, that is one hole extent spanning the whole disk (where there is no space for improvement) this does not add any noticeable overhead. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- Yes, I hate the filter name, so if anyone has a better idea, feel free to suggest (or rename) it. configure.ac | 2 + filters/cacheextents/Makefile.am | 64 ++++++ filters/cacheextents/cacheextents.c | 200 ++++++++++++++++++ .../nbdkit-cacheextents-filter.pod | 47 ++++ 4 files changed, 313 insertions(+) create mode 100644 filters/cacheextents/Makefile.am create mode 100644 filters/cacheextents/cacheextents.c create mode 100644 filters/cacheextents/nbdkit-cacheextents-filter.pod diff --git a/configure.ac b/configure.ac index 58031f3c6d86..8db5a4cac3c5 100644 --- a/configure.ac +++ b/configure.ac @@ -825,6 +825,7 @@ plugins="$(echo $(printf %s\\n $lang_plugins $non_lang_plugins | sort -u))" filters="\ blocksize \ cache \ + cacheextents \ cow \ delay \ error \ @@ -900,6 +901,7 @@ AC_CONFIG_FILES([Makefile filters/Makefile filters/blocksize/Makefile filters/cache/Makefile + filters/cacheextents/Makefile filters/cow/Makefile filters/delay/Makefile filters/error/Makefile diff --git a/filters/cacheextents/Makefile.am b/filters/cacheextents/Makefile.am new file mode 100644 index 000000000000..9e5b59c09c59 --- /dev/null +++ b/filters/cacheextents/Makefile.am @@ -0,0 +1,64 @@ +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +include $(top_srcdir)/common-rules.mk + +EXTRA_DIST = nbdkit-cacheextents-filter.pod + +filter_LTLIBRARIES = nbdkit-cacheextents-filter.la + +nbdkit_cacheextents_filter_la_SOURCES = \ + cacheextents.c \ + $(top_srcdir)/include/nbdkit-filter.h + +nbdkit_cacheextents_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include \ + -I$(top_srcdir)/common/include \ + -I$(top_srcdir)/common/utils +nbdkit_cacheextents_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) +nbdkit_cacheextents_filter_la_LDFLAGS = \ + -module -avoid-version -shared \ + -Wl,--version-script=$(top_srcdir)/filters/filters.syms +nbdkit_cacheextents_filter_la_LIBADD = \ + $(top_builddir)/common/utils/libutils.la + +if HAVE_POD + +man_MANS = nbdkit-cacheextents-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-cacheextents-filter.1: nbdkit-cacheextents-filter.pod + $(PODWRAPPER) --section=1 --man $@ \ + --html $(top_builddir)/html/$@.html \ + $< + +endif HAVE_POD diff --git a/filters/cacheextents/cacheextents.c b/filters/cacheextents/cacheextents.c new file mode 100644 index 000000000000..89ac2e7a0767 --- /dev/null +++ b/filters/cacheextents/cacheextents.c @@ -0,0 +1,200 @@ +/* nbdkit + * Copyright (C) 2019 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <errno.h> +#include <inttypes.h> + +#include <pthread.h> + +#include <nbdkit-filter.h> + +#include "cleanup.h" + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL + +/* This lock protects the global state. */ +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; + +/* The real size of the underlying plugin. */ +static uint64_t size; + +/* Cached extents from the last extents () call and its start and end for the + sake of simplicity. */ +struct nbdkit_extents *cache_extents; +static uint64_t cache_start; +static uint64_t cache_end; + +static void +cacheextents_unload (void) +{ + nbdkit_extents_free (cache_extents); +} + +static int +cacheextents_add (struct nbdkit_extents *extents) +{ + size_t i = 0; + + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) { + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i); + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) < 0) + return -1; + } + + return 0; +} + +static int +cacheextents_fill (struct nbdkit_extents *extents) +{ + size_t i = 0; + size_t count = nbdkit_extents_count (extents); + struct nbdkit_extent first = nbdkit_get_extent (extents, 0); + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1); + + nbdkit_extents_free (cache_extents); + cache_start = first.offset; + cache_end = last.offset + last.length; + cache_extents = nbdkit_extents_new (cache_start, cache_end); + + for (i = 0; i < count; i++) { + struct nbdkit_extent ex = nbdkit_get_extent (extents, i); + nbdkit_debug ("cacheextents: updating cache with" + ": offset=%" PRIu64 + ": length=%" PRIu64 + "; type=%x", + ex.offset, ex.length, ex.type); + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) < 0) + return -1; + } + + return 0; +} + +static int +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, + int *err) +{ + nbdkit_debug ("cacheextents:" + " cache_start=%" PRIu64 + " cache_end=%" PRIu64 + " cache_extents=%p", + cache_start, cache_end, cache_extents); + + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + + if (cache_extents && + offset >= cache_start && offset < cache_end) { + nbdkit_debug ("cacheextents: returning from cache"); + int r = cacheextents_add (extents); + if (r < 0) + *err = errno; + return r; + } + + nbdkit_debug ("cacheextents: cache miss"); + int r = next_ops->extents (nxdata, count, offset, flags, extents, err); + if (r < 0) + return r; + + if (cacheextents_fill (extents) < 0) { + *err = errno; + return -1; + } + + return r; +} + +/* Any changes to the data need to clean the cache. + * + * Similarly to readahead filter this could be more intelligent, but there would + * be very little benefit. + */ + +static void +kill_cacheextents (void) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + nbdkit_extents_free (cache_extents); + cache_extents = NULL; +} + +static int +cacheextents_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + const void *buf, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + kill_cacheextents (); + return next_ops->pwrite (nxdata, buf, count, offset, flags, err); +} + +static int +cacheextents_trim (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + kill_cacheextents (); + return next_ops->trim (nxdata, count, offset, flags, err); +} + +static int +cacheextents_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + kill_cacheextents (); + return next_ops->zero (nxdata, count, offset, flags, err); +} + +static struct nbdkit_filter filter = { + .name = "cacheextents", + .longname = "nbdkit cacheextents filter", + .version = PACKAGE_VERSION, + .unload = cacheextents_unload, + .pwrite = cacheextents_pwrite, + .trim = cacheextents_trim, + .zero = cacheextents_zero, + .extents = cacheextents_extents, +}; + +NBDKIT_REGISTER_FILTER (filter) diff --git a/filters/cacheextents/nbdkit-cacheextents-filter.pod b/filters/cacheextents/nbdkit-cacheextents-filter.pod new file mode 100644 index 000000000000..88f3173b9375 --- /dev/null +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod @@ -0,0 +1,47 @@ +=head1 NAME + +nbdkit-cacheextents-filter - cache extents + +=head1 SYNOPSIS + + nbdkit --filter=cacheextents plugin + +=head1 DESCRIPTION + +C<nbdkit-cacheextents-filter> is a filter that caches the result of last +extents() call. + +A common use for this filter is to accelerate returning extents data for +clients which ask for extents with the REQ_ONE flag set (like +S<C<qemu-img convert>>) and is especially useful in combination with +plugins that report multiple extents in one call, but with high latency +for each of those calls (like L<nbdkit-vddk-plugin(1)>). For example: + + nbdkit -U - --filter=cacheextents --run 'qemu-img map $nbd' vddk ... + +For files with big extents (when it is unlikely for one extents() call +to return multiple different extents) this does not slow down the +access. + +=head1 PARAMETERS + +There are no parameters specific to nbdkit-cacheextents-filter. Any +parameters are passed through to and processed by the underlying +plugin in the normal way. + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-cache-filter(1)>, +L<nbdkit-readahead-filter(1)>, +L<nbdkit-vddk-plugin(1)>, +L<nbdkit-filter(3)>, +L<qemu-img(1)>. + +=head1 AUTHORS + +Martin Kletzander + +=head1 COPYRIGHT + +Copyright (C) 2019 Red Hat Inc. -- 2.21.0
Richard W.M. Jones
2019-May-15 14:55 UTC
Re: [Libguestfs] [nbdkit PATCH] Introduce cacheextents filter
On Wed, May 15, 2019 at 02:54:17PM +0200, Martin Kletzander wrote:> This filter caches the last result of the extents() call and offers a nice > speed-up for clients that only support req_on=1 in combination with plugins like > vddk, which has no overhead for returning information for multiple extents in > one call, but that call is very time-consuming. > > Quick test showed that on a fast connection and a sparsely allocated 16G disk > with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it > takes 5s to the first extents request). For 100G disk with no data on it, that > is one hole extent spanning the whole disk (where there is no space for > improvement) this does not add any noticeable overhead. > > Signed-off-by: Martin Kletzander <mkletzan@redhat.com> > --- > Yes, I hate the filter name, so if anyone has a better idea, feel free to > suggest (or rename) it.Basically it's fine, however it could really do with having a test. I have no particular comment about the name. A few more things inline below.> +static int > +cacheextents_add (struct nbdkit_extents *extents) > +{ > + size_t i = 0; > + > + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) { > + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i); > + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) < 0)This really only returns -1 on failure, and not any other negative values.> + return -1; > + } > + > + return 0; > +} > + > +static int > +cacheextents_fill (struct nbdkit_extents *extents) > +{ > + size_t i = 0; > + size_t count = nbdkit_extents_count (extents); > + struct nbdkit_extent first = nbdkit_get_extent (extents, 0); > + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1); > + > + nbdkit_extents_free (cache_extents); > + cache_start = first.offset; > + cache_end = last.offset + last.length; > + cache_extents = nbdkit_extents_new (cache_start, cache_end); > + > + for (i = 0; i < count; i++) { > + struct nbdkit_extent ex = nbdkit_get_extent (extents, i); > + nbdkit_debug ("cacheextents: updating cache with" > + ": offset=%" PRIu64 > + ": length=%" PRIu64 > + "; type=%x", > + ex.offset, ex.length, ex.type); > + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) < 0).. and the same here.> + return -1; > + } > + > + return 0; > +} > + > +static int > +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle, uint32_t count, uint64_t offset, uint32_t flags, > + struct nbdkit_extents *extents, > + int *err) > +{ > + nbdkit_debug ("cacheextents:" > + " cache_start=%" PRIu64 > + " cache_end=%" PRIu64 > + " cache_extents=%p", > + cache_start, cache_end, cache_extents); > + > + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);Shouldn't the debug statement be protected by the lock too? Even reading global state can be racy.> +static struct nbdkit_filter filter = { > + .name = "cacheextents", > + .longname = "nbdkit cacheextents filter", > + .version = PACKAGE_VERSION, > + .unload = cacheextents_unload, > + .pwrite = cacheextents_pwrite, > + .trim = cacheextents_trim, > + .zero = cacheextents_zero, > + .extents = cacheextents_extents, > +};I know that for some reason emacs C mode has started to indent struct fields like this, which is very annoying, but these would be better indented in the same way as the other plugins and filters. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Martin Kletzander
2019-May-15 21:42 UTC
Re: [Libguestfs] [nbdkit PATCH] Introduce cacheextents filter
On Wed, May 15, 2019 at 03:55:28PM +0100, Richard W.M. Jones wrote:>On Wed, May 15, 2019 at 02:54:17PM +0200, Martin Kletzander wrote: >> This filter caches the last result of the extents() call and offers a nice >> speed-up for clients that only support req_on=1 in combination with plugins like >> vddk, which has no overhead for returning information for multiple extents in >> one call, but that call is very time-consuming. >> >> Quick test showed that on a fast connection and a sparsely allocated 16G disk >> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it >> takes 5s to the first extents request). For 100G disk with no data on it, that >> is one hole extent spanning the whole disk (where there is no space for >> improvement) this does not add any noticeable overhead. >> >> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> >> --- >> Yes, I hate the filter name, so if anyone has a better idea, feel free to >> suggest (or rename) it. > >Basically it's fine, however it could really do with having a test. I >have no particular comment about the name. >I was thinking about a test, that's right. I'll see how to do one properly.>A few more things inline below. > >> +static int >> +cacheextents_add (struct nbdkit_extents *extents) >> +{ >> + size_t i = 0; >> + >> + for (i = 0; i < nbdkit_extents_count (cache_extents); i++) { >> + struct nbdkit_extent ex = nbdkit_get_extent (cache_extents, i); >> + if (nbdkit_add_extent (extents, ex.offset, ex.length, ex.type) < 0) > >This really only returns -1 on failure, and not any other negative >values. >I got used to this pattern when we started introducing different errors with different negative values and modifying all callers would be error-prone. I'll change this. Should I change all the (r < 0) as well?>> + return -1; >> + } >> + >> + return 0; >> +} >> + >> +static int >> +cacheextents_fill (struct nbdkit_extents *extents) >> +{ >> + size_t i = 0; >> + size_t count = nbdkit_extents_count (extents); >> + struct nbdkit_extent first = nbdkit_get_extent (extents, 0); >> + struct nbdkit_extent last = nbdkit_get_extent (extents, count - 1); >> + >> + nbdkit_extents_free (cache_extents); >> + cache_start = first.offset; >> + cache_end = last.offset + last.length; >> + cache_extents = nbdkit_extents_new (cache_start, cache_end); >> + >> + for (i = 0; i < count; i++) { >> + struct nbdkit_extent ex = nbdkit_get_extent (extents, i); >> + nbdkit_debug ("cacheextents: updating cache with" >> + ": offset=%" PRIu64 >> + ": length=%" PRIu64 >> + "; type=%x", >> + ex.offset, ex.length, ex.type); >> + if (nbdkit_add_extent (cache_extents, ex.offset, ex.length, ex.type) < 0) > >.. and the same here. > >> + return -1; >> + } >> + >> + return 0; >> +} >> + >> +static int >> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata, >> + void *handle, uint32_t count, uint64_t offset, uint32_t flags, >> + struct nbdkit_extents *extents, >> + int *err) >> +{ >> + nbdkit_debug ("cacheextents:" >> + " cache_start=%" PRIu64 >> + " cache_end=%" PRIu64 >> + " cache_extents=%p", >> + cache_start, cache_end, cache_extents); >> + >> + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); > >Shouldn't the debug statement be protected by the lock too? Even >reading global state can be racy. >Yes, you're right.>> +static struct nbdkit_filter filter = { >> + .name = "cacheextents", >> + .longname = "nbdkit cacheextents filter", >> + .version = PACKAGE_VERSION, >> + .unload = cacheextents_unload, >> + .pwrite = cacheextents_pwrite, >> + .trim = cacheextents_trim, >> + .zero = cacheextents_zero, >> + .extents = cacheextents_extents, >> +}; > >I know that for some reason emacs C mode has started to indent struct >fields like this, which is very annoying, but these would be better >indented in the same way as the other plugins and filters. >Funny that I actually missed this, it looks weird here to me as well. Will fix in v2. Thanks for the review.>Rich. > >-- >Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones >Read my programming and virtualization blog: http://rwmj.wordpress.com >libguestfs lets you edit virtual machines. Supports shell scripting, >bindings from many languages. http://libguestfs.org