thr3ads.net - Libguestfs - Re: [Libguestfs] [PATCH nbdkit 2/2] Add new retry filter. [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2019-Sep-19 11:34 UTC

[Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

This is a retry filter implementation as outlined here:
https://www.redhat.com/archives/libguestfs/2019-September/msg00167.html

It is only lightly tested.  One way to test it is to try an SSH copy
(see the commit message for patch 2/2), and in the middle of the copy
kill the per-connection sshd on the remote machine.  You will see that
the copy recovers after a few seconds.  Add the nbdkit -v option to
see some useful messages from the filter:

  ...
  nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656
  nbdkit: ssh[1]: error: read failed: Socket error: Connection reset by peer
(-1)
  nbdkit: ssh[1]: debug: retry 1: original errno = 5
  nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
  nbdkit: ssh[1]: debug: ssh: reopen
  nbdkit: ssh[1]: debug: close
  nbdkit: ssh[1]: error: cannot close file: Socket error: Connection reset by
peer
  nbdkit: ssh[1]: debug: ssh: open readonly=0
  nbdkit: ssh[1]: debug: opened libssh handle
  nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656
  ...

It really needs an actual test, which is tricky to implement.  My
thinking currently is that a custom (sh) plugin is the way to go, but
it would also be nice to test curl/ssh/vddk with this filter -- if
anyone has any suggestions on how to do that ...

This patch will probably conflict with Eric's series here:
https://www.redhat.com/archives/libguestfs/2019-September/msg00180.html
but that shouldn't be too hard to fix up.

Rich.

Richard W.M. Jones

2019-Sep-19 11:34 UTC

head link

[Libguestfs] [PATCH nbdkit 1/2] filters: Implement next_ops .reopen call.

This is intended for use by the forthcoming retry filter to close and
reopen the backend chain.  It is handled entirely by server/backend.c
as no cooperation is needed with the plugin.

Note the explicit readonly parameter: An alternative would be to store
the previous readonly setting in the b_conn_handle struct.  However
passing it explicitly allows the retry filter to retry as readonly,
which might be useful.  This design does however require any filter
which might call .reopen to save the original readonly parameter from
the .open call.
---
 include/nbdkit-filter.h | 6 ++++++
 server/backend.c        | 9 +++++++++
 server/filters.c        | 8 ++++++++
 server/internal.h       | 4 ++++
 4 files changed, 27 insertions(+)

diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h
index babce0d..c1930c1 100644
--- a/include/nbdkit-filter.h
+++ b/include/nbdkit-filter.h
@@ -68,6 +68,12 @@ typedef int nbdkit_next_open (void *nxdata,
                               int readonly);
 
 struct nbdkit_next_ops {
+  /* Performs close + open on the underlying chain.
+   * Used by the retry filter.
+   */
+  int (*reopen) (void *nxdata, int readonly);
+
+  /* The rest of the next ops are the same as normal plugin operations. */
   int64_t (*get_size) (void *nxdata);
 
   int (*can_write) (void *nxdata);
diff --git a/server/backend.c b/server/backend.c
index 3b213bf..8a434bd 100644
--- a/server/backend.c
+++ b/server/backend.c
@@ -221,6 +221,15 @@ backend_valid_range (struct backend *b, struct connection
*conn,
 
 /* Wrappers for all callbacks in a filter's struct nbdkit_next_ops. */
 
+int
+backend_reopen (struct backend *b, struct connection *conn, int readonly)
+{
+  debug ("%s: reopen", b->name);
+
+  b->close (b, conn);
+  return backend_open (b, conn, readonly);
+}
+
 int64_t
 backend_get_size (struct backend *b, struct connection *conn)
 {
diff --git a/server/filters.c b/server/filters.c
index 1ee6282..8a7e637 100644
--- a/server/filters.c
+++ b/server/filters.c
@@ -240,6 +240,13 @@ filter_close (struct backend *b, struct connection *conn)
  * single ‘void *nxdata’ struct pointer (‘b_conn’).
  */
 
+static int
+next_reopen (void *nxdata, int readonly)
+{
+  struct b_conn *b_conn = nxdata;
+  return backend_reopen (b_conn->b, b_conn->conn, readonly);
+}
+
 static int64_t
 next_get_size (void *nxdata)
 {
@@ -376,6 +383,7 @@ next_cache (void *nxdata, uint32_t count, uint64_t offset,
 }
 
 static struct nbdkit_next_ops next_ops = {
+  .reopen = next_reopen,
   .get_size = next_get_size,
   .can_write = next_can_write,
   .can_flush = next_can_flush,
diff --git a/server/internal.h b/server/internal.h
index c31bb34..06a2e4f 100644
--- a/server/internal.h
+++ b/server/internal.h
@@ -290,6 +290,7 @@ struct backend {
   int (*prepare) (struct backend *, struct connection *conn, int readonly);
   int (*finalize) (struct backend *, struct connection *conn);
   void (*close) (struct backend *, struct connection *conn);
+  int (*reopen) (struct backend *, struct connection *conn);
 
   int64_t (*get_size) (struct backend *, struct connection *conn);
   int (*can_write) (struct backend *, struct connection *conn);
@@ -341,6 +342,9 @@ extern bool backend_valid_range (struct backend *b, struct
connection *conn,
                                  uint64_t offset, uint32_t count)
   __attribute__((__nonnull__ (1, 2)));
 
+extern int backend_reopen (struct backend *b, struct connection *conn,
+                           int readonly)
+  __attribute__((__nonnull__ (1, 2)));
 extern int64_t backend_get_size (struct backend *b, struct connection *conn)
   __attribute__((__nonnull__ (1, 2)));
 extern int backend_can_write (struct backend *b, struct connection *conn)
-- 
2.23.0

Richard W.M. Jones

2019-Sep-19 11:34 UTC

head link

[Libguestfs] [PATCH nbdkit 2/2] Add new retry filter.

This filter can be used to transparently reopen/retry when a plugin
fails.  The connection is closed and reopened which for most plugins
causes them to attempt to reconnect to their source.

For example if doing a long or slow SSH copy:

  nbdkit -U - ssh host=remote /var/tmp/test.iso \
    --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'

if the SSH connection or network goes down in the middle then the
whole operation will fail.

By adding the retry filter:

  nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
    --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'

this operation can recover from temporary failures in at least some
circumstances.  The NBD connection (a local Unix domain socket in the
example above) is not interrupted during retries, so NBD clients don't
need to be taught how to retry - everything is handled internally by
nbdkit.
---
 TODO                                  |   6 +
 configure.ac                          |   2 +
 docs/nbdkit-captive.pod               |   3 +
 filters/retry/Makefile.am             |  67 ++++++
 filters/retry/nbdkit-retry-filter.pod | 103 +++++++++
 filters/retry/retry.c                 | 296 ++++++++++++++++++++++++++
 plugins/curl/nbdkit-curl-plugin.pod   |   1 +
 plugins/ssh/nbdkit-ssh-plugin.pod     |   1 +
 plugins/vddk/nbdkit-vddk-plugin.pod   |   2 +
 server/backend.c                      |   8 +
 10 files changed, 489 insertions(+)

diff --git a/TODO b/TODO
index d2cf0ae..02a095e 100644
--- a/TODO
+++ b/TODO
@@ -136,6 +136,12 @@ nbdkit-rate-filter:
 * split large requests to avoid long, lumpy sleeps when request size
   is much larger than rate limit
 
+nbdkit-retry-filter:
+
+* allow user to specify which errors cause a retry and which ones are
+  passed through; for example there's probably no point retrying on
+  ENOMEM
+
 Filters for security
 --------------------
 
diff --git a/configure.ac b/configure.ac
index 8261a7f..7ce3dbb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -872,6 +872,7 @@ filters="\
         partition \
         rate \
         readahead \
+        retry \
         stats \
         truncate \
         xz \
@@ -952,6 +953,7 @@ AC_CONFIG_FILES([Makefile
                  filters/partition/Makefile
                  filters/rate/Makefile
                  filters/readahead/Makefile
+                 filters/retry/Makefile
                  filters/stats/Makefile
                  filters/truncate/Makefile
                  filters/xz/Makefile
diff --git a/docs/nbdkit-captive.pod b/docs/nbdkit-captive.pod
index 59df669..0adf0a0 100644
--- a/docs/nbdkit-captive.pod
+++ b/docs/nbdkit-captive.pod
@@ -92,6 +92,9 @@ help performance:
  nbdkit -U - --filter=readahead curl https://example.com/disk.img \
         --run 'qemu-img convert $nbd disk.img'
 
+If the source suffers from temporary network failures
+L<nbdkit-retry-filter(1)> may help.
+
 To overwrite a file inside an uncompressed tar file (the file being
 overwritten must be the same size), use L<nbdkit-tar-plugin(1)> like
 this:
diff --git a/filters/retry/Makefile.am b/filters/retry/Makefile.am
new file mode 100644
index 0000000..86011bb
--- /dev/null
+++ b/filters/retry/Makefile.am
@@ -0,0 +1,67 @@
+# nbdkit
+# Copyright (C) 2019 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+include $(top_srcdir)/common-rules.mk
+
+EXTRA_DIST = nbdkit-retry-filter.pod
+
+filter_LTLIBRARIES = nbdkit-retry-filter.la
+
+nbdkit_retry_filter_la_SOURCES = \
+	retry.c \
+	$(top_srcdir)/include/nbdkit-filter.h \
+	$(NULL)
+
+nbdkit_retry_filter_la_CPPFLAGS = \
+	-I$(top_srcdir)/include \
+	-I$(top_srcdir)/common/include \
+	-I$(top_srcdir)/common/utils \
+	$(NULL)
+nbdkit_retry_filter_la_CFLAGS = $(WARNINGS_CFLAGS)
+nbdkit_retry_filter_la_LDFLAGS = \
+	-module -avoid-version -shared \
+	-Wl,--version-script=$(top_srcdir)/filters/filters.syms \
+	$(NULL)
+nbdkit_retry_filter_la_LIBADD = \
+	$(top_builddir)/common/utils/libutils.la \
+	$(NULL)
+
+if HAVE_POD
+
+man_MANS = nbdkit-retry-filter.1
+CLEANFILES += $(man_MANS)
+
+nbdkit-retry-filter.1: nbdkit-retry-filter.pod
+	$(PODWRAPPER) --section=1 --man $@ \
+	    --html $(top_builddir)/html/$@.html \
+	    $<
+
+endif HAVE_POD
diff --git a/filters/retry/nbdkit-retry-filter.pod
b/filters/retry/nbdkit-retry-filter.pod
new file mode 100644
index 0000000..aeed79c
--- /dev/null
+++ b/filters/retry/nbdkit-retry-filter.pod
@@ -0,0 +1,103 @@
+=head1 NAME
+
+nbdkit-retry-filter - reopen connection on error
+
+=head1 SYNOPSIS
+
+ nbdkit --filter=retry PLUGIN
+
+=head1 DESCRIPTION
+
+C<nbdkit-retry-filter> is a filter that transparently reopens the
+plugin connection when an error is encountered.  It can be used to
+make long-running copy operations reliable in the presence of
+temporary network failures, without requiring any changes to the
+plugin.
+
+A number of parameters are available to control:
+
+=over 4
+
+=item *
+
+How many times we retry.
+
+=item *
+
+The delay between retries, and whether we wait longer each time (known
+as "exponential back-off").
+
+=item *
+
+If we reopen the plugin in read-only mode after the first failure.
+
+=back
+
+The default (no parameters) is designed to offer a happy medium
+between recovering from short temporary failures but not doing
+anything too bad when permanent or unrecoverable failures happen: We
+retry 5 times with exponential back-off, waiting in total about 1
+minute before we give up.
+
+=head1 EXAMPLE
+
+In this example we copy and convert a large file using
+L<nbdkit-ssh-plugin(1)>, L<qemu-img(1)> and
L<nbdkit-captive(1)>.
+
+ nbdkit -U - ssh --filter=retry host=remote /var/tmp/test.iso \
+   --run 'qemu-img convert -p -f raw $nbd -O qcow2 test.qcow2'
+
+Without I<--filter=retry> a temporary failure would cause the copy to
+fail (for example, the remote host’s firewall is restarted causing the
+SSH connection to be closed).  Adding this filter means that it may be
+possible to transparently recover.
+
+=head1 PARAMETERS
+
+=over 4
+
+=item B<retries=>N
+
+The number of times any single operation will be retried before we
+give up and fail the operation.  The default is 5.
+
+=item B<retry-delay=>N
+
+The number of seconds to wait before retrying.  The default is 2
+seconds.
+
+=item B<retry-exponential=yes>
+
+Use exponential back-off.  The retry delay is doubled between each
+retry.  This is the default.
+
+=item B<retry-exponential=no>
+
+Do not use exponential back-off.  The retry delay is the same between
+each retry.
+
+=item B<retry-readonly=yes>
+
+As soon as a failure occurs and we retry, switch the underlying plugin
+permanently to read-only mode.
+
+=item B<retry-readonly=no>
+
+Do not change the read-write/read-only mode of the plugin when
+retrying.  This is the default.
+
+=back
+
+=head1 SEE ALSO
+
+L<nbdkit(1)>,
+L<nbdkit-filter(3)>,
+L<nbdkit-readahead-filter(1)>.
+
+=head1 AUTHORS
+
+Richard W.M. Jones
+
+=head1 COPYRIGHT
+
+Copyright (C) 2019 Red Hat Inc.
diff --git a/filters/retry/retry.c b/filters/retry/retry.c
new file mode 100644
index 0000000..cee7a5b
--- /dev/null
+++ b/filters/retry/retry.c
@@ -0,0 +1,296 @@
+/* nbdkit
+ * Copyright (C) 2019 Red Hat Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS
IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <inttypes.h>
+#include <string.h>
+#include <sys/time.h>
+
+#include <nbdkit-filter.h>
+
+#include "cleanup.h"
+
+static int retries = 5;         /* 0 = filter is disabled */
+static int initial_delay = 2;
+static bool exponential_backoff = true;
+static bool force_readonly = false;
+
+static int
+retry_config (nbdkit_next_config *next, void *nxdata,
+              const char *key, const char *value)
+{
+  int r;
+
+  if (strcmp (key, "retries") == 0) {
+    if (sscanf (value, "%d", &retries) != 1 || retries < 0) {
+      nbdkit_error ("cannot parse retries: %s", value);
+      return -1;
+    }
+  }
+  else if (strcmp (key, "retry-delay") == 0) {
+    if (sscanf (value, "%d", &initial_delay) != 1 ||
initial_delay <= 0) {
+      nbdkit_error ("cannot parse retry-delay: %s", value);
+      return -1;
+    }
+  }
+  else if (strcmp (key, "retry-exponential") == 0) {
+    r = nbdkit_parse_bool (value);
+    if (r == -1)
+      return -1;
+    exponential_backoff = r;
+    return 0;
+  }
+  else if (strcmp (key, "retry-readonly") == 0) {
+    r = nbdkit_parse_bool (value);
+    if (r == -1)
+      return -1;
+    force_readonly = r;
+    return 0;
+  }
+
+  return next (nxdata, key, value);
+}
+
+#define retry_config_help \
+  "retries=<N>              Number of retries (default: 5).\n"
\
+  "retry-delay=<N>          Second to wait before retry (default:
2).\n" \
+  "retry-exponential=yes|no Exponential back-off (default: yes).\n" \
+  "retry-readonly=yes|no    Force read-only on failure (default:
no).\n"
+
+struct retry_handle {
+  int readonly;                 /* Save original readonly setting. */
+};
+
+static void *
+retry_open (nbdkit_next_open *next, void *nxdata, int readonly)
+{
+  struct retry_handle *h;
+
+  if (next (nxdata, readonly) == -1)
+    return NULL;
+
+  h = malloc (sizeof *h);
+  if (h == NULL) {
+    nbdkit_error ("malloc: %m");
+    return NULL;
+  }
+
+  h->readonly = readonly;
+
+  return h;
+}
+
+static void
+retry_close (void *handle)
+{
+  struct retry_handle *h = handle;
+
+  free (h);
+}
+
+/* This function encapsulates the common retry logic used across all
+ * data commands.  If it returns true then the data command will retry
+ * the operation.  ‘struct retry_data’ is stack data saved between
+ * retries within the same command, and is initialized to zero.
+ */
+struct retry_data {
+  int retry;                    /* Retry number (0 = first time). */
+  int delay;                    /* Seconds to wait before retrying. */
+};
+
+static bool
+do_retry (struct retry_handle *h,
+          struct retry_data *data,
+          struct nbdkit_next_ops *next_ops, void *nxdata,
+          int *err)
+{
+  /* If it's the first retry, initialize the other fields in *data. */
+  if (data->retry == 0)
+    data->delay = initial_delay;
+
+  /* Log the original errno since it will be lost when we retry. */
+  nbdkit_debug ("retry %d: original errno = %d", data->retry+1,
*err);
+
+  if (data->retry >= retries) {
+    nbdkit_debug ("could not recover after %d retries", retries);
+    return false;
+  }
+
+  nbdkit_debug ("waiting %d seconds before retrying",
data->delay);
+  if (nbdkit_nanosleep (data->delay, 0) == -1) {
+    /* We could do this but it would overwrite the more important
+     * errno from the underlying data call.
+     */
+    /* *err = errno; */
+    return false;
+  }
+
+  /* Reopen the connection. */
+  if (next_ops->reopen (nxdata, h->readonly || force_readonly) == -1)
+    return false;
+
+  /* Update *data in case we are called again. */
+  data->retry++;
+  if (exponential_backoff)
+    data->delay *= 2;
+
+  /* Retry the data command. */
+  return true;
+}
+
+static int
+retry_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
+             void *handle, void *buf, uint32_t count, uint64_t offset,
+             uint32_t flags, int *err)
+{
+  struct retry_handle *h = handle;
+  struct retry_data data = {0};
+  int r;
+
+ again:
+  r = next_ops->pread (nxdata, buf, count, offset, flags, err);
+  if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto
again;
+
+  return r;
+}
+
+/* Write. */
+static int
+retry_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata,
+              void *handle,
+              const void *buf, uint32_t count, uint64_t offset,
+              uint32_t flags, int *err)
+{
+  struct retry_handle *h = handle;
+  struct retry_data data = {0};
+  int r;
+
+ again:
+  r = next_ops->pwrite (nxdata, buf, count, offset, flags, err);
+  if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto
again;
+
+  return r;
+}
+
+/* Trim. */
+static int
+retry_trim (struct nbdkit_next_ops *next_ops, void *nxdata,
+            void *handle,
+            uint32_t count, uint64_t offset, uint32_t flags,
+            int *err)
+{
+  struct retry_handle *h = handle;
+  struct retry_data data = {0};
+  int r;
+
+ again:
+  r = next_ops->trim (nxdata, count, offset, flags, err);
+  if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto
again;
+
+  return r;
+}
+
+/* Zero. */
+static int
+retry_zero (struct nbdkit_next_ops *next_ops, void *nxdata,
+            void *handle,
+            uint32_t count, uint64_t offset, uint32_t flags,
+            int *err)
+{
+  struct retry_handle *h = handle;
+  struct retry_data data = {0};
+  int r;
+
+ again:
+  r = next_ops->zero (nxdata, count, offset, flags, err);
+  if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto
again;
+
+  return r;
+}
+
+/* Extents. */
+static int
+retry_extents (struct nbdkit_next_ops *next_ops, void *nxdata,
+               void *handle,
+               uint32_t count, uint64_t offset, uint32_t flags,
+               struct nbdkit_extents *extents, int *err)
+{
+  struct retry_handle *h = handle;
+  struct retry_data data = {0};
+  int r;
+
+ again:
+  r = next_ops->extents (nxdata, count, offset, flags, extents, err);
+  if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto
again;
+
+  return r;
+}
+
+/* Cache. */
+static int
+retry_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
+             void *handle,
+             uint32_t count, uint64_t offset, uint32_t flags,
+             int *err)
+{
+  struct retry_handle *h = handle;
+  struct retry_data data = {0};
+  int r;
+
+ again:
+  r = next_ops->cache (nxdata, count, offset, flags, err);
+  if (r == -1 && do_retry (h, &data, next_ops, nxdata, err)) goto
again;
+
+  return r;
+}
+
+static struct nbdkit_filter filter = {
+  .name              = "retry",
+  .longname          = "nbdkit retry filter",
+  .config            = retry_config,
+  .config_help       = retry_config_help,
+  .open              = retry_open,
+  .close             = retry_close,
+  .pread             = retry_pread,
+  .pwrite            = retry_pwrite,
+  .trim              = retry_trim,
+  .zero              = retry_zero,
+  .extents           = retry_extents,
+  .cache             = retry_cache,
+};
+
+NBDKIT_REGISTER_FILTER(filter)
diff --git a/plugins/curl/nbdkit-curl-plugin.pod
b/plugins/curl/nbdkit-curl-plugin.pod
index 1c96b4a..2d5c564 100644
--- a/plugins/curl/nbdkit-curl-plugin.pod
+++ b/plugins/curl/nbdkit-curl-plugin.pod
@@ -159,6 +159,7 @@ L<CURLOPT_COOKIE(3)>
 L<CURLOPT_VERBOSE(3)>,
 L<nbdkit(1)>,
 L<nbdkit-readahead-filter(1)>,
+L<nbdkit-retry-filter(1)>,
 L<nbdkit-ssh-plugin(1)>,
 L<nbdkit-plugin(3)>,
 L<http://curl.haxx.se>.
diff --git a/plugins/ssh/nbdkit-ssh-plugin.pod
b/plugins/ssh/nbdkit-ssh-plugin.pod
index 2010837..316e852 100644
--- a/plugins/ssh/nbdkit-ssh-plugin.pod
+++ b/plugins/ssh/nbdkit-ssh-plugin.pod
@@ -288,6 +288,7 @@ libssh itself.
 L<nbdkit(1)>,
 L<nbdkit-curl-plugin(1)>,
 L<nbdkit-readahead-filter(1)>,
+L<nbdkit-retry-filter(1)>,
 L<nbdkit-plugin(3)>,
 L<ssh(1)>,
 L<ssh-agent(1)>,
diff --git a/plugins/vddk/nbdkit-vddk-plugin.pod
b/plugins/vddk/nbdkit-vddk-plugin.pod
index 8820a50..9e451c6 100644
--- a/plugins/vddk/nbdkit-vddk-plugin.pod
+++ b/plugins/vddk/nbdkit-vddk-plugin.pod
@@ -431,6 +431,8 @@ writing.
 
 L<nbdkit(1)>,
 L<nbdkit-plugin(3)>,
+L<nbdkit-readahead-filter(1)>,
+L<nbdkit-retry-filter(1)>,
 L<virsh(1)>,
 L<https://www.vmware.com/support/developer/vddk/>
 
diff --git a/server/backend.c b/server/backend.c
index 8a434bd..b8c5742 100644
--- a/server/backend.c
+++ b/server/backend.c
@@ -224,9 +224,17 @@ backend_valid_range (struct backend *b, struct connection
*conn,
 int
 backend_reopen (struct backend *b, struct connection *conn, int readonly)
 {
+  struct b_conn_handle *h = &conn->handles[b->i];
+
   debug ("%s: reopen", b->name);
 
   b->close (b, conn);
+
+  /* This forces .open to recalculate h->can_write, which might have
+   * changed since we may have a new readonly value.
+   */
+  h->can_write = -1;
+
   return backend_open (b, conn, readonly);
 }
 
-- 
2.23.0

Richard W.M. Jones

2019-Sep-19 11:37 UTC

head link

Re: [Libguestfs] [PATCH nbdkit 2/2] Add new retry filter.

On Thu, Sep 19, 2019 at 12:34:17PM +0100, Richard W.M. Jones
wrote:> diff --git a/server/backend.c b/server/backend.c
> index 8a434bd..b8c5742 100644
> --- a/server/backend.c
> +++ b/server/backend.c
> @@ -224,9 +224,17 @@ backend_valid_range (struct backend *b, struct
connection *conn,
>  int
>  backend_reopen (struct backend *b, struct connection *conn, int readonly)
>  {
> +  struct b_conn_handle *h = &conn->handles[b->i];
> +
>    debug ("%s: reopen", b->name);
>  
>    b->close (b, conn);
> +
> +  /* This forces .open to recalculate h->can_write, which might have
> +   * changed since we may have a new readonly value.
> +   */
> +  h->can_write = -1;
> +
>    return backend_open (b, conn, readonly);
>  }
Obviously this hunk should be in the first patch ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Nenad Peric

2019-Sep-19 11:47 UTC

head link

Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

A theoretical way to test this against any other protocol is to have a
firewall rule which can be triggered locally which just drops the traffic
and then re-enables the traffic in a while (short or long, can be
decided/tweaked).
This way, there is no need to think about which protocol is used, when we
kill the connectivity.
The only consideration could probably be that the dropped connections would
cover only the ports we need.
To reduce the size of the hammer :)


*Nenad Perić*

PRINCIPAL SOFTWARE ENGINEER

Red Hat - Migration Engineering

nenad@redhat.com

<http://redhat.com>


On Thu, Sep 19, 2019 at 1:34 PM Richard W.M. Jones <rjones@redhat.com>
wrote:
> This is a retry filter implementation as outlined here:
> https://www.redhat.com/archives/libguestfs/2019-September/msg00167.html
>
> It is only lightly tested.  One way to test it is to try an SSH copy
> (see the commit message for patch 2/2), and in the middle of the copy
> kill the per-connection sshd on the remote machine.  You will see that
> the copy recovers after a few seconds.  Add the nbdkit -v option to
> see some useful messages from the filter:
>
>   ...
>   nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656
>   nbdkit: ssh[1]: error: read failed: Socket error: Connection reset by
> peer (-1)
>   nbdkit: ssh[1]: debug: retry 1: original errno = 5
>   nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
>   nbdkit: ssh[1]: debug: ssh: reopen
>   nbdkit: ssh[1]: debug: close
>   nbdkit: ssh[1]: error: cannot close file: Socket error: Connection reset
> by peer
>   nbdkit: ssh[1]: debug: ssh: open readonly=0
>   nbdkit: ssh[1]: debug: opened libssh handle
>   nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656
>   ...
>
> It really needs an actual test, which is tricky to implement.  My
> thinking currently is that a custom (sh) plugin is the way to go, but
> it would also be nice to test curl/ssh/vddk with this filter -- if
> anyone has any suggestions on how to do that ...
>
> This patch will probably conflict with Eric's series here:
> https://www.redhat.com/archives/libguestfs/2019-September/msg00180.html
> but that shouldn't be too hard to fix up.
>
> Rich.
>
>
>

Martin Kletzander

2019-Sep-19 12:04 UTC

head link

Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

On Thu, Sep 19, 2019 at 01:47:45PM +0200, Nenad Peric
wrote:>A theoretical way to test this against any other protocol is to have a
>firewall rule which can be triggered locally which just drops the traffic
>and then re-enables the traffic in a while (short or long, can be
>decided/tweaked).
>This way, there is no need to think about which protocol is used, when we
>kill the connectivity.
>The only consideration could probably be that the dropped connections would
>cover only the ports we need.
>To reduce the size of the hammer :)
>
Good idea, you just need to keep in mind that the underlying protocols might
have their own keep-alive methods, which should be removed or at least minimised
when testing this.
>
>*Nenad Perić*
>
>PRINCIPAL SOFTWARE ENGINEER
>
>Red Hat - Migration Engineering
>
>nenad@redhat.com
>
><http://redhat.com>
>
>
>On Thu, Sep 19, 2019 at 1:34 PM Richard W.M. Jones <rjones@redhat.com>
>wrote:
>
>> This is a retry filter implementation as outlined here:
>> https://www.redhat.com/archives/libguestfs/2019-September/msg00167.html
>>
>> It is only lightly tested.  One way to test it is to try an SSH copy
>> (see the commit message for patch 2/2), and in the middle of the copy
>> kill the per-connection sshd on the remote machine.  You will see that
>> the copy recovers after a few seconds.  Add the nbdkit -v option to
>> see some useful messages from the filter:
>>
>>   ...
>>   nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656
>>   nbdkit: ssh[1]: error: read failed: Socket error: Connection reset by
>> peer (-1)
>>   nbdkit: ssh[1]: debug: retry 1: original errno = 5
>>   nbdkit: ssh[1]: debug: waiting 2 seconds before retrying
>>   nbdkit: ssh[1]: debug: ssh: reopen
>>   nbdkit: ssh[1]: debug: close
>>   nbdkit: ssh[1]: error: cannot close file: Socket error: Connection
reset
>> by peer
>>   nbdkit: ssh[1]: debug: ssh: open readonly=0
>>   nbdkit: ssh[1]: debug: opened libssh handle
>>   nbdkit: ssh[1]: debug: ssh: pread count=2097152 offset=216006656
>>   ...
>>
>> It really needs an actual test, which is tricky to implement.  My
>> thinking currently is that a custom (sh) plugin is the way to go, but
>> it would also be nice to test curl/ssh/vddk with this filter -- if
>> anyone has any suggestions on how to do that ...
>>
>> This patch will probably conflict with Eric's series here:
>> https://www.redhat.com/archives/libguestfs/2019-September/msg00180.html
>> but that shouldn't be too hard to fix up.
>>
>> Rich.
>>
>>
>>

Richard W.M. Jones

2019-Sep-19 12:49 UTC

head link

Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

On Thu, Sep 19, 2019 at 01:47:45PM +0200, Nenad Peric
wrote:> A theoretical way to test this against any other protocol is to have a
> firewall rule which can be triggered locally which just drops the traffic
> and then re-enables the traffic in a while (short or long, can be
> decided/tweaked).
We generally want our tests to run without root and without doing
weird stuff to the machine (they run from ‘make check’) so I'm afraid
updating firewall rules isn't going to work :-(

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Reasonably Related Threads

Search for more maybe matching threads

Libguestfs - Sep 2019 - Re: [PATCH nbdkit 2/2] Add new retry filter.

[Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

[Libguestfs] [PATCH nbdkit 1/2] filters: Implement next_ops .reopen call.

[Libguestfs] [PATCH nbdkit 2/2] Add new retry filter.

Re: [Libguestfs] [PATCH nbdkit 2/2] Add new retry filter.

Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

Re: [Libguestfs] [PATCH nbdkit 0/2] Add new retry filter.

Reasonably Related Threads