Richard W.M. Jones
2019-Sep-19 11:34 UTC
[Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.
This is a retry filter implementation as outlined here: https://www.redhat.com/archives/libguestfs/2019-September/msg00167.html It is only lightly tested. One way to test it is to try an SSH copy (see the commit message for patch 2/2), and in the middle of the copy kill the per-connection sshd on the remote machine. You will see that the copy recovers after a few seconds. Add the nbdkit -v option to see some useful messages from the filter: ... nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656 nbdkit: ssh[1]: error: read failed: Socket error: Connection reset by peer (-1) nbdkit: ssh[1]: debug: retry 1: original errno = 5 nbdkit: ssh[1]: debug: waiting 2 seconds before retrying nbdkit: ssh[1]: debug: ssh: reopen nbdkit: ssh[1]: debug: close nbdkit: ssh[1]: error: cannot close file: Socket error: Connection reset by peer nbdkit: ssh[1]: debug: ssh: open readonly=0 nbdkit: ssh[1]: debug: opened libssh handle nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656 ... It really needs an actual test, which is tricky to implement. My thinking currently is that a custom (sh) plugin is the way to go, but it would also be nice to test curl/ssh/vddk with this filter -- if anyone has any suggestions on how to do that ... This patch will probably conflict with Eric's series here: https://www.redhat.com/archives/libguestfs/2019-September/msg00180.html but that shouldn't be too hard to fix up. Rich.
Richard W.M. Jones
2019-Sep-19 11:34 UTC
[Libguestfs] [PATCH nbdkit 1/2] filters: Implement next_ops .reopen call.
This is intended for use by the forthcoming retry filter to close and reopen the backend chain. It is handled entirely by server/backend.c as no cooperation is needed with the plugin. Note the explicit readonly parameter: An alternative would be to store the previous readonly setting in the b_conn_handle struct. However passing it explicitly allows the retry filter to retry as readonly, which might be useful. This design does however require any filter which might call .reopen to save the original readonly parameter from the .open call. --- include/nbdkit-filter.h | 6 ++++++ server/backend.c | 9 +++++++++ server/filters.c | 8 ++++++++ server/internal.h | 4 ++++ 4 files changed, 27 insertions(+) diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index babce0d..c1930c1 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -68,6 +68,12 @@ typedef int nbdkit_next_open (void *nxdata, int readonly); struct nbdkit_next_ops { + /* Performs close + open on the underlying chain. + * Used by the retry filter. + */ + int (*reopen) (void *nxdata, int readonly); + + /* The rest of the next ops are the same as normal plugin operations. */ int64_t (*get_size) (void *nxdata); int (*can_write) (void *nxdata); diff --git a/server/backend.c b/server/backend.c index 3b213bf..8a434bd 100644 --- a/server/backend.c +++ b/server/backend.c @@ -221,6 +221,15 @@ backend_valid_range (struct backend *b, struct connection *conn, /* Wrappers for all callbacks in a filter's struct nbdkit_next_ops. */ +int +backend_reopen (struct backend *b, struct connection *conn, int readonly) +{ + debug ("%s: reopen", b->name); + + b->close (b, conn); + return backend_open (b, conn, readonly); +} + int64_t backend_get_size (struct backend *b, struct connection *conn) { diff --git a/server/filters.c b/server/filters.c index 1ee6282..8a7e637 100644 --- a/server/filters.c +++ b/server/filters.c @@ -240,6 +240,13 @@ filter_close (struct backend *b, struct connection *conn) * single ‘void *nxdata’ struct pointer (‘b_conn’). */ +static int +next_reopen (void *nxdata, int readonly) +{ + struct b_conn *b_conn = nxdata; + return backend_reopen (b_conn->b, b_conn->conn, readonly); +} + static int64_t next_get_size (void *nxdata) { @@ -376,6 +383,7 @@ next_cache (void *nxdata, uint32_t count, uint64_t offset, } static struct nbdkit_next_ops next_ops = { + .reopen = next_reopen, .get_size = next_get_size, .can_write = next_can_write, .can_flush = next_can_flush, diff --git a/server/internal.h b/server/internal.h index c31bb34..06a2e4f 100644 --- a/server/internal.h +++ b/server/internal.h @@ -290,6 +290,7 @@ struct backend { int (*prepare) (struct backend *, struct connection *conn, int readonly); int (*finalize) (struct backend *, struct connection *conn); void (*close) (struct backend *, struct connection *conn); + int (*reopen) (struct backend *, struct connection *conn); int64_t (*get_size) (struct backend *, struct connection *conn); int (*can_write) (struct backend *, struct connection *conn); @@ -341,6 +342,9 @@ extern bool backend_valid_range (struct backend *b, struct connection *conn, uint64_t offset, uint32_t count) __attribute__((__nonnull__ (1, 2))); +extern int backend_reopen (struct backend *b, struct connection *conn, + int readonly) + __attribute__((__nonnull__ (1, 2))); extern int64_t backend_get_size (struct backend *b, struct connection *conn) __attribute__((__nonnull__ (1, 2))); extern int backend_can_write (struct backend *b, struct connection *conn) -- 2.23.0
Richard W.M. Jones
2019-Sep-19 11:34 UTC
[Libguestfs] [PATCH nbdkit 2/2] Add new retry filter.
This filter can be used to transparently reopen/retry when a plugin fails. The connection is closed and reopened which for most plugins causes them to attempt to reconnect to their source. For example if doing a long or slow SSH copy: nbdkit -U - ssh host=remote /var/tmp/test.iso \ --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2' if the SSH connection or network goes down in the middle then the whole operation will fail. By adding the retry filter: nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \ --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2' this operation can recover from temporary failures in at least some circumstances. The NBD connection (a local Unix domain socket in the example above) is not interrupted during retries, so NBD clients don't need to be taught how to retry - everything is handled internally by nbdkit. --- TODO | 6 + configure.ac | 2 + docs/nbdkit-captive.pod | 3 + filters/retry/Makefile.am | 67 ++++++ filters/retry/nbdkit-retry-filter.pod | 103 +++++++++ filters/retry/retry.c | 296 ++++++++++++++++++++++++++ plugins/curl/nbdkit-curl-plugin.pod | 1 + plugins/ssh/nbdkit-ssh-plugin.pod | 1 + plugins/vddk/nbdkit-vddk-plugin.pod | 2 + server/backend.c | 8 + 10 files changed, 489 insertions(+) diff --git a/TODO b/TODO index d2cf0ae..02a095e 100644 --- a/TODO +++ b/TODO @@ -136,6 +136,12 @@ nbdkit-rate-filter: * split large requests to avoid long, lumpy sleeps when request size is much larger than rate limit +nbdkit-retry-filter: + +* allow user to specify which errors cause a retry and which ones are + passed through; for example there's probably no point retrying on + ENOMEM + Filters for security -------------------- diff --git a/configure.ac b/configure.ac index 8261a7f..7ce3dbb 100644 --- a/configure.ac +++ b/configure.ac @@ -872,6 +872,7 @@ filters="\ partition \ rate \ readahead \ + retry \ stats \ truncate \ xz \ @@ -952,6 +953,7 @@ AC_CONFIG_FILES([Makefile filters/partition/Makefile filters/rate/Makefile filters/readahead/Makefile + filters/retry/Makefile filters/stats/Makefile filters/truncate/Makefile filters/xz/Makefile diff --git a/docs/nbdkit-captive.pod b/docs/nbdkit-captive.pod index 59df669..0adf0a0 100644 --- a/docs/nbdkit-captive.pod +++ b/docs/nbdkit-captive.pod @@ -92,6 +92,9 @@ help performance: nbdkit -U - --filter=readahead curl https://example.com/disk.img \ --run 'qemu-img convert $nbd disk.img' +If the source suffers from temporary network failures +L<nbdkit-retry-filter(1)> may help. + To overwrite a file inside an uncompressed tar file (the file being overwritten must be the same size), use L<nbdkit-tar-plugin(1)> like this: diff --git a/filters/retry/Makefile.am b/filters/retry/Makefile.am new file mode 100644 index 0000000..86011bb --- /dev/null +++ b/filters/retry/Makefile.am @@ -0,0 +1,67 @@ +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +include $(top_srcdir)/common-rules.mk + +EXTRA_DIST = nbdkit-retry-filter.pod + +filter_LTLIBRARIES = nbdkit-retry-filter.la + +nbdkit_retry_filter_la_SOURCES = \ + retry.c \ + $(top_srcdir)/include/nbdkit-filter.h \ + $(NULL) + +nbdkit_retry_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include \ + -I$(top_srcdir)/common/include \ + -I$(top_srcdir)/common/utils \ + $(NULL) +nbdkit_retry_filter_la_CFLAGS = $(WARNINGS_CFLAGS) +nbdkit_retry_filter_la_LDFLAGS = \ + -module -avoid-version -shared \ + -Wl,--version-script=$(top_srcdir)/filters/filters.syms \ + $(NULL) +nbdkit_retry_filter_la_LIBADD = \ + $(top_builddir)/common/utils/libutils.la \ + $(NULL) + +if HAVE_POD + +man_MANS = nbdkit-retry-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-retry-filter.1: nbdkit-retry-filter.pod + $(PODWRAPPER) --section=1 --man $@ \ + --html $(top_builddir)/html/$@.html \ + $< + +endif HAVE_POD diff --git a/filters/retry/nbdkit-retry-filter.pod b/filters/retry/nbdkit-retry-filter.pod new file mode 100644 index 0000000..aeed79c --- /dev/null +++ b/filters/retry/nbdkit-retry-filter.pod @@ -0,0 +1,103 @@ +=head1 NAME + +nbdkit-retry-filter - reopen connection on error + +=head1 SYNOPSIS + + nbdkit --filter=retry PLUGIN + +=head1 DESCRIPTION + +C<nbdkit-retry-filter> is a filter that transparently reopens the +plugin connection when an error is encountered. It can be used to +make long-running copy operations reliable in the presence of +temporary network failures, without requiring any changes to the +plugin. + +A number of parameters are available to control: + +=over 4 + +=item * + +How many times we retry. + +=item * + +The delay between retries, and whether we wait longer each time (known +as "exponential back-off"). + +=item * + +If we reopen the plugin in read-only mode after the first failure. + +=back + +The default (no parameters) is designed to offer a happy medium +between recovering from short temporary failures but not doing +anything too bad when permanent or unrecoverable failures happen: We +retry 5 times with exponential back-off, waiting in total about 1 +minute before we give up. + +=head1 EXAMPLE + +In this example we copy and convert a large file using +L<nbdkit-ssh-plugin(1)>, L<qemu-img(1)> and L<nbdkit-captive(1)>. + + nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \ + --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2' + +Without I<--filter=retry> a temporary failure would cause the copy to +fail (for example, the remote host’s firewall is restarted causing the +SSH connection to be closed). Adding this filter means that it may be +possible to transparently recover. + +=head1 PARAMETERS + +=over 4 + +=item B<retries=>N + +The number of times any single operation will be retried before we +give up and fail the operation. The default is 5. + +=item B<retry-delay=>N + +The number of seconds to wait before retrying. The default is 2 +seconds. + +=item B<retry-exponential=yes> + +Use exponential back-off. The retry delay is doubled between each +retry. This is the default. + +=item B<retry-exponential=no> + +Do not use exponential back-off. The retry delay is the same between +each retry. + +=item B<retry-readonly=yes> + +As soon as a failure occurs and we retry, switch the underlying plugin +permanently to read-only mode. + +=item B<retry-readonly=no> + +Do not change the read-write/read-only mode of the plugin when +retrying. This is the default. + +=back + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-filter(3)>, +L<nbdkit-readahead-filter(1)>. + +=head1 AUTHORS + +Richard W.M. Jones + +=head1 COPYRIGHT + +Copyright (C) 2019 Red Hat Inc. diff --git a/filters/retry/retry.c b/filters/retry/retry.c new file mode 100644 index 0000000..cee7a5b --- /dev/null +++ b/filters/retry/retry.c @@ -0,0 +1,296 @@ +/* nbdkit + * Copyright (C) 2019 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdbool.h> +#include <inttypes.h> +#include <string.h> +#include <sys/time.h> + +#include <nbdkit-filter.h> + +#include "cleanup.h" + +static int retries = 5; /* 0 = filter is disabled */ +static int initial_delay = 2; +static bool exponential_backoff = true; +static bool force_readonly = false; + +static int +retry_config (nbdkit_next_config *next, void *nxdata, + const char *key, const char *value) +{ + int r; + + if (strcmp (key, "retries") == 0) { + if (sscanf (value, "%d", &retries) != 1 || retries < 0) { + nbdkit_error ("cannot parse retries: %s", value); + return -1; + } + } + else if (strcmp (key, "retry-delay") == 0) { + if (sscanf (value, "%d", &initial_delay) != 1 || initial_delay <= 0) { + nbdkit_error ("cannot parse retry-delay: %s", value); + return -1; + } + } + else if (strcmp (key, "retry-exponential") == 0) { + r = nbdkit_parse_bool (value); + if (r == -1) + return -1; + exponential_backoff = r; + return 0; + } + else if (strcmp (key, "retry-readonly") == 0) { + r = nbdkit_parse_bool (value); + if (r == -1) + return -1; + force_readonly = r; + return 0; + } + + return next (nxdata, key, value); +} + +#define retry_config_help \ + "retries=<N> Number of retries (default: 5).\n" \ + "retry-delay=<N> Second to wait before retry (default: 2).\n" \ + "retry-exponential=yes|no Exponential back-off (default: yes).\n" \ + "retry-readonly=yes|no Force read-only on failure (default: no).\n" + +struct retry_handle { + int readonly; /* Save original readonly setting. */ +}; + +static void * +retry_open (nbdkit_next_open *next, void *nxdata, int readonly) +{ + struct retry_handle *h; + + if (next (nxdata, readonly) == -1) + return NULL; + + h = malloc (sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + h->readonly = readonly; + + return h; +} + +static void +retry_close (void *handle) +{ + struct retry_handle *h = handle; + + free (h); +} + +/* This function encapsulates the common retry logic used across all + * data commands. If it returns true then the data command will retry + * the operation. ‘struct retry_data’ is stack data saved between + * retries within the same command, and is initialized to zero. + */ +struct retry_data { + int retry; /* Retry number (0 = first time). */ + int delay; /* Seconds to wait before retrying. */ +}; + +static bool +do_retry (struct retry_handle *h, + struct retry_data *data, + struct nbdkit_next_ops *next_ops, void *nxdata, + int *err) +{ + /* If it's the first retry, initialize the other fields in *data. */ + if (data->retry == 0) + data->delay = initial_delay; + + /* Log the original errno since it will be lost when we retry. */ + nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err); + + if (data->retry >= retries) { + nbdkit_debug ("could not recover after %d retries", retries); + return false; + } + + nbdkit_debug ("waiting %d seconds before retrying", data->delay); + if (nbdkit_nanosleep (data->delay, 0) == -1) { + /* We could do this but it would overwrite the more important + * errno from the underlying data call. + */ + /* *err = errno; */ + return false; + } + + /* Reopen the connection. */ + if (next_ops->reopen (nxdata, h->readonly || force_readonly) == -1) + return false; + + /* Update *data in case we are called again. */ + data->retry++; + if (exponential_backoff) + data->delay *= 2; + + /* Retry the data command. */ + return true; +} + +static int +retry_pread (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + struct retry_handle *h = handle; + struct retry_data data = {0}; + int r; + + again: + r = next_ops->pread (nxdata, buf, count, offset, flags, err); + if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto again; + + return r; +} + +/* Write. */ +static int +retry_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + const void *buf, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + struct retry_handle *h = handle; + struct retry_data data = {0}; + int r; + + again: + r = next_ops->pwrite (nxdata, buf, count, offset, flags, err); + if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto again; + + return r; +} + +/* Trim. */ +static int +retry_trim (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + struct retry_handle *h = handle; + struct retry_data data = {0}; + int r; + + again: + r = next_ops->trim (nxdata, count, offset, flags, err); + if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto again; + + return r; +} + +/* Zero. */ +static int +retry_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + struct retry_handle *h = handle; + struct retry_data data = {0}; + int r; + + again: + r = next_ops->zero (nxdata, count, offset, flags, err); + if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto again; + + return r; +} + +/* Extents. */ +static int +retry_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + struct retry_handle *h = handle; + struct retry_data data = {0}; + int r; + + again: + r = next_ops->extents (nxdata, count, offset, flags, extents, err); + if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto again; + + return r; +} + +/* Cache. */ +static int +retry_cache (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + struct retry_handle *h = handle; + struct retry_data data = {0}; + int r; + + again: + r = next_ops->cache (nxdata, count, offset, flags, err); + if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto again; + + return r; +} + +static struct nbdkit_filter filter = { + .name = "retry", + .longname = "nbdkit retry filter", + .config = retry_config, + .config_help = retry_config_help, + .open = retry_open, + .close = retry_close, + .pread = retry_pread, + .pwrite = retry_pwrite, + .trim = retry_trim, + .zero = retry_zero, + .extents = retry_extents, + .cache = retry_cache, +}; + +NBDKIT_REGISTER_FILTER(filter) diff --git a/plugins/curl/nbdkit-curl-plugin.pod b/plugins/curl/nbdkit-curl-plugin.pod index 1c96b4a..2d5c564 100644 --- a/plugins/curl/nbdkit-curl-plugin.pod +++ b/plugins/curl/nbdkit-curl-plugin.pod @@ -159,6 +159,7 @@ L<CURLOPT_COOKIE(3)> L<CURLOPT_VERBOSE(3)>, L<nbdkit(1)>, L<nbdkit-readahead-filter(1)>, +L<nbdkit-retry-filter(1)>, L<nbdkit-ssh-plugin(1)>, L<nbdkit-plugin(3)>, L<http://curl.haxx.se>. diff --git a/plugins/ssh/nbdkit-ssh-plugin.pod b/plugins/ssh/nbdkit-ssh-plugin.pod index 2010837..316e852 100644 --- a/plugins/ssh/nbdkit-ssh-plugin.pod +++ b/plugins/ssh/nbdkit-ssh-plugin.pod @@ -288,6 +288,7 @@ libssh itself. L<nbdkit(1)>, L<nbdkit-curl-plugin(1)>, L<nbdkit-readahead-filter(1)>, +L<nbdkit-retry-filter(1)>, L<nbdkit-plugin(3)>, L<ssh(1)>, L<ssh-agent(1)>, diff --git a/plugins/vddk/nbdkit-vddk-plugin.pod b/plugins/vddk/nbdkit-vddk-plugin.pod index 8820a50..9e451c6 100644 --- a/plugins/vddk/nbdkit-vddk-plugin.pod +++ b/plugins/vddk/nbdkit-vddk-plugin.pod @@ -431,6 +431,8 @@ writing. L<nbdkit(1)>, L<nbdkit-plugin(3)>, +L<nbdkit-readahead-filter(1)>, +L<nbdkit-retry-filter(1)>, L<virsh(1)>, L<https://www.vmware.com/support/developer/vddk/> diff --git a/server/backend.c b/server/backend.c index 8a434bd..b8c5742 100644 --- a/server/backend.c +++ b/server/backend.c @@ -224,9 +224,17 @@ backend_valid_range (struct backend *b, struct connection *conn, int backend_reopen (struct backend *b, struct connection *conn, int readonly) { + struct b_conn_handle *h = &conn->handles[b->i]; + debug ("%s: reopen", b->name); b->close (b, conn); + + /* This forces .open to recalculate h->can_write, which might have + * changed since we may have a new readonly value. + */ + h->can_write = -1; + return backend_open (b, conn, readonly); } -- 2.23.0
Richard W.M. Jones
2019-Sep-19 11:37 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] Add new retry filter.
On Thu, Sep 19, 2019 at 12:34:17PM +0100, Richard W.M. Jones wrote:> diff --git a/server/backend.c b/server/backend.c > index 8a434bd..b8c5742 100644 > --- a/server/backend.c > +++ b/server/backend.c > @@ -224,9 +224,17 @@ backend_valid_range (struct backend *b, struct connection *conn, > int > backend_reopen (struct backend *b, struct connection *conn, int readonly) > { > + struct b_conn_handle *h = &conn->handles[b->i]; > + > debug ("%s: reopen", b->name); > > b->close (b, conn); > + > + /* This forces .open to recalculate h->can_write, which might have > + * changed since we may have a new readonly value. > + */ > + h->can_write = -1; > + > return backend_open (b, conn, readonly); > }Obviously this hunk should be in the first patch ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Nenad Peric
2019-Sep-19 11:47 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.
A theoretical way to test this against any other protocol is to have a firewall rule which can be triggered locally which just drops the traffic and then re-enables the traffic in a while (short or long, can be decided/tweaked). This way, there is no need to think about which protocol is used, when we kill the connectivity. The only consideration could probably be that the dropped connections would cover only the ports we need. To reduce the size of the hammer :) *Nenad Perić* PRINCIPAL SOFTWARE ENGINEER Red Hat - Migration Engineering nenad@redhat.com <http://redhat.com> On Thu, Sep 19, 2019 at 1:34 PM Richard W.M. Jones <rjones@redhat.com> wrote:> This is a retry filter implementation as outlined here: > https://www.redhat.com/archives/libguestfs/2019-September/msg00167.html > > It is only lightly tested. One way to test it is to try an SSH copy > (see the commit message for patch 2/2), and in the middle of the copy > kill the per-connection sshd on the remote machine. You will see that > the copy recovers after a few seconds. Add the nbdkit -v option to > see some useful messages from the filter: > > ... > nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656 > nbdkit: ssh[1]: error: read failed: Socket error: Connection reset by > peer (-1) > nbdkit: ssh[1]: debug: retry 1: original errno = 5 > nbdkit: ssh[1]: debug: waiting 2 seconds before retrying > nbdkit: ssh[1]: debug: ssh: reopen > nbdkit: ssh[1]: debug: close > nbdkit: ssh[1]: error: cannot close file: Socket error: Connection reset > by peer > nbdkit: ssh[1]: debug: ssh: open readonly=0 > nbdkit: ssh[1]: debug: opened libssh handle > nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656 > ... > > It really needs an actual test, which is tricky to implement. My > thinking currently is that a custom (sh) plugin is the way to go, but > it would also be nice to test curl/ssh/vddk with this filter -- if > anyone has any suggestions on how to do that ... > > This patch will probably conflict with Eric's series here: > https://www.redhat.com/archives/libguestfs/2019-September/msg00180.html > but that shouldn't be too hard to fix up. > > Rich. > > >
Martin Kletzander
2019-Sep-19 12:04 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.
On Thu, Sep 19, 2019 at 01:47:45PM +0200, Nenad Peric wrote:>A theoretical way to test this against any other protocol is to have a >firewall rule which can be triggered locally which just drops the traffic >and then re-enables the traffic in a while (short or long, can be >decided/tweaked). >This way, there is no need to think about which protocol is used, when we >kill the connectivity. >The only consideration could probably be that the dropped connections would >cover only the ports we need. >To reduce the size of the hammer :) >Good idea, you just need to keep in mind that the underlying protocols might have their own keep-alive methods, which should be removed or at least minimised when testing this.> >*Nenad Perić* > >PRINCIPAL SOFTWARE ENGINEER > >Red Hat - Migration Engineering > >nenad@redhat.com > ><http://redhat.com> > > >On Thu, Sep 19, 2019 at 1:34 PM Richard W.M. Jones <rjones@redhat.com> >wrote: > >> This is a retry filter implementation as outlined here: >> https://www.redhat.com/archives/libguestfs/2019-September/msg00167.html >> >> It is only lightly tested. One way to test it is to try an SSH copy >> (see the commit message for patch 2/2), and in the middle of the copy >> kill the per-connection sshd on the remote machine. You will see that >> the copy recovers after a few seconds. Add the nbdkit -v option to >> see some useful messages from the filter: >> >> ... >> nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656 >> nbdkit: ssh[1]: error: read failed: Socket error: Connection reset by >> peer (-1) >> nbdkit: ssh[1]: debug: retry 1: original errno = 5 >> nbdkit: ssh[1]: debug: waiting 2 seconds before retrying >> nbdkit: ssh[1]: debug: ssh: reopen >> nbdkit: ssh[1]: debug: close >> nbdkit: ssh[1]: error: cannot close file: Socket error: Connection reset >> by peer >> nbdkit: ssh[1]: debug: ssh: open readonly=0 >> nbdkit: ssh[1]: debug: opened libssh handle >> nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656 >> ... >> >> It really needs an actual test, which is tricky to implement. My >> thinking currently is that a custom (sh) plugin is the way to go, but >> it would also be nice to test curl/ssh/vddk with this filter -- if >> anyone has any suggestions on how to do that ... >> >> This patch will probably conflict with Eric's series here: >> https://www.redhat.com/archives/libguestfs/2019-September/msg00180.html >> but that shouldn't be too hard to fix up. >> >> Rich. >> >> >>
Richard W.M. Jones
2019-Sep-19 12:49 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.
On Thu, Sep 19, 2019 at 01:47:45PM +0200, Nenad Peric wrote:> A theoretical way to test this against any other protocol is to have a > firewall rule which can be triggered locally which just drops the traffic > and then re-enables the traffic in a while (short or long, can be > decided/tweaked).We generally want our tests to run without root and without doing weird stuff to the machine (they run from ‘make check’) so I'm afraid updating firewall rules isn't going to work :-( Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Possibly Parallel Threads
- [PATCH nbdkit 0/2] Add new retry filter.
- [PATCH nbdkit v3 1/3] filters: Implement next_ops .reopen call.
- [nbdkit PATCH 0/5] More retry fixes
- [PATCH nbdkit v2 2/4] filters: Implement next_ops .reopen call.
- [nbdkit PATCH 5/5] server: Ensure .finalize and .close are called as needed