thr3ads.net - Xen devel - [PATCH 0 of 2 V6] libxl: Remus support [May 2012]

If this information is useful, please help other people find it:
Share via:

Shriram Rajagopalan

2012-May-17 19:48 UTC

[PATCH 0 of 2 V6] libxl: Remus support

This patch series introduces a basic framework to incorporate
Remus into the libxl toolstack. The only functionality currently
implemented is memory checkpointing.

These patches depend on the "libxl: refactor suspend/resume code"
patch series.

Changes in V6:
 * Rebase to current tip

Changes in V5:
 * Rebase to current tip

Changes in V4:
 * more explanation on blackhole replication in xl.pod
 * moved comment on save_callbacks to xenguest.h
 * rebased to current tip, removed useless comments.

Changes in V3:
 * Rebased w.r.t Stefano''s patches.

Changes in V2:
 * Move libxl_domain_remus_start into the save_callbacks implementation patch
 * return proper error codes instead of -1.
 * Add documentation to docs/man/xl.pod.1


Shriram

Shriram Rajagopalan

2012-May-17 19:48 UTC

head link

[PATCH 1 of 2 V6] libxl: Remus - suspend/postflush/commit callbacks

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1337283427 25200
# Node ID 496ff6ce5bb63a2f034d2a861f34cfa8cbf06552
# Parent  24c462a07e167e4ce35a22197dbef74853b08359
libxl: Remus - suspend/postflush/commit callbacks

 * Add libxl callback functions for Remus checkpoint suspend, postflush
   (aka resume) and checkpoint commit callbacks.
 * suspend callback is a stub that just bounces off
   libxl__domain_suspend_common_callback - which suspends the domain and
   saves the devices model state to a file.
 * resume callback currently just resumes the domain (and the device model).
 * commit callback just writes out the saved device model state to the
   network and sleeps for the checkpoint interval.
 * Introduce a new public API, libxl_domain_remus_start (currently a stub)
   that sets up the network and disk buffer and initiates continuous
   checkpointing.

 * Future patches will augment these callbacks/functions with more
functionalities
   like issuing network buffer plug/unplug commands, disk checkpoint commands,
etc.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxc/xenguest.h
--- a/tools/libxc/xenguest.h	Thu May 17 12:37:05 2012 -0700
+++ b/tools/libxc/xenguest.h	Thu May 17 12:37:07 2012 -0700
@@ -33,10 +33,29 @@
 
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
+    /* Called after expiration of checkpoint interval,
+     * to suspend the guest.
+     */
     int (*suspend)(void* data);
-    /* callback to rendezvous with external checkpoint functions */
+
+    /* Called after the guest''s dirty pages have been
+     *  copied into an output buffer.
+     * Callback function resumes the guest & the device model,
+     *  returns to xc_domain_save.
+     * xc_domain_save then flushes the output buffer, while the
+     *  guest continues to run.
+     */
     int (*postcopy)(void* data);
-    /* returns:
+
+    /* Called after the memory checkpoint has been flushed
+     * out into the network. Typical actions performed in this
+     * callback include:
+     *   (a) send the saved device model state (for HVM guests),
+     *   (b) wait for checkpoint ack
+     *   (c) release the network output buffer pertaining to the acked
checkpoint.
+     *   (c) sleep for the checkpoint interval.
+     *
+     * returns:
      * 0: terminate checkpointing gracefully
      * 1: take another checkpoint */
     int (*checkpoint)(void* data);
diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl.c
--- a/tools/libxl/libxl.c	Thu May 17 12:37:05 2012 -0700
+++ b/tools/libxl/libxl.c	Thu May 17 12:37:07 2012 -0700
@@ -619,6 +619,41 @@
     return ptr;
 }
 
+/* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+                             uint32_t domid, int send_fd, int recv_fd)
+{
+    GC_INIT(ctx);
+    libxl_domain_type type = libxl__domain_type(gc, domid);
+    int rc = 0;
+
+    if (info == NULL) {
+        LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
+                   "No remus_info structure supplied for domain %d",
domid);
+        rc = ERROR_INVAL;
+        goto remus_fail;
+    }
+
+    /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+    /* Point of no return */
+    rc = libxl__domain_suspend_common(gc, domid, send_fd, type, /* live */ 1,
+                                      /* debug */ 0, info);
+
+    /* 
+     * With Remus, if we reach this point, it means either
+     * backup died or some network error occurred preventing us
+     * from sending checkpoints.
+     */
+
+    /* TBD: Remus cleanup - i.e. detach qdisc, release other
+     * resources.
+     */
+ remus_fail:
+    GC_FREE;
+    return rc;
+}
+
 int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
                          uint32_t domid, int fd)
 {
@@ -628,7 +663,9 @@
     int debug = info != NULL && info->flags & XL_SUSPEND_DEBUG;
     int rc = 0;
 
-    rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug);
+    rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug,
+                                      /* No Remus */ NULL);
+
     if (!rc && type == LIBXL_DOMAIN_TYPE_HVM)
         rc = libxl__domain_save_device_model(gc, domid, fd);
     GC_FREE;
diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl.h
--- a/tools/libxl/libxl.h	Thu May 17 12:37:05 2012 -0700
+++ b/tools/libxl/libxl.h	Thu May 17 12:37:07 2012 -0700
@@ -525,6 +525,8 @@
 
 void libxl_domain_config_init(libxl_domain_config *d_config);
 void libxl_domain_config_dispose(libxl_domain_config *d_config);
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+                             uint32_t domid, int send_fd, int recv_fd);
 int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
                           uint32_t domid, int fd);
 
diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c	Thu May 17 12:37:05 2012 -0700
+++ b/tools/libxl/libxl_dom.c	Thu May 17 12:37:07 2012 -0700
@@ -566,6 +566,8 @@
     int hvm;
     unsigned int flags;
     int guest_responded;
+    int save_fd; /* Migration stream fd (for Remus) */
+    int interval; /* checkpoint interval (for Remus) */
 };
 
 static int libxl__domain_suspend_common_switch_qemu_logdirty(int domid,
unsigned int enable, void *data)
@@ -848,9 +850,43 @@
     return 0;
 }
 
+static int libxl__remus_domain_suspend_callback(void *data)
+{
+    /* TODO: Issue disk and network checkpoint reqs. */
+    return libxl__domain_suspend_common_callback(data);
+}
+
+static int libxl__remus_domain_resume_callback(void *data)
+{
+    struct suspendinfo *si = data;
+    libxl_ctx *ctx = libxl__gc_owner(si->gc);
+
+    /* Resumes the domain and the device model */
+    if (libxl_domain_resume(ctx, si->domid, /* Fast Suspend */1))
+        return 0;
+
+    /* TODO: Deal with disk. Start a new network output buffer */
+    return 1;
+}
+
+static int libxl__remus_domain_checkpoint_callback(void *data)
+{
+    struct suspendinfo *si = data;
+
+    /* This would go into tailbuf. */
+    if (si->hvm &&
+        libxl__domain_save_device_model(si->gc, si->domid,
si->save_fd))
+        return 0;
+
+    /* TODO: Wait for disk and memory ack, release network buffer */
+    usleep(si->interval * 1000);
+    return 1;
+}
+
 int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
                                  libxl_domain_type type,
-                                 int live, int debug)
+                                 int live, int debug,
+                                 const libxl_domain_remus_info *r_info)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     int flags;
@@ -881,10 +917,20 @@
         return ERROR_INVAL;
     }
 
+    memset(&si, 0, sizeof(si));
     flags = (live) ? XCFLAGS_LIVE : 0
           | (debug) ? XCFLAGS_DEBUG : 0
           | (hvm) ? XCFLAGS_HVM : 0;
 
+    if (r_info != NULL) {
+        si.interval = r_info->interval;
+        if (r_info->compression)
+            flags |= XCFLAGS_CHECKPOINT_COMPRESS;
+        si.save_fd = fd;
+    }
+    else
+        si.save_fd = -1;
+
     si.domid = domid;
     si.flags = flags;
     si.hvm = hvm;
@@ -908,7 +954,13 @@
     }
 
     memset(&callbacks, 0, sizeof(callbacks));
-    callbacks.suspend = libxl__domain_suspend_common_callback;
+    if (r_info != NULL) {
+        callbacks.suspend = libxl__remus_domain_suspend_callback;
+        callbacks.postcopy = libxl__remus_domain_resume_callback;
+        callbacks.checkpoint = libxl__remus_domain_checkpoint_callback;
+    } else
+        callbacks.suspend = libxl__domain_suspend_common_callback;
+
     callbacks.switch_qemu_logdirty =
libxl__domain_suspend_common_switch_qemu_logdirty;
     callbacks.toolstack_save = libxl__toolstack_save;
     callbacks.data = &si;
diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h	Thu May 17 12:37:05 2012 -0700
+++ b/tools/libxl/libxl_internal.h	Thu May 17 12:37:07 2012 -0700
@@ -757,7 +757,8 @@
                                          int fd);
 _hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
                                          libxl_domain_type type,
-                                         int live, int debug);
+                                         int live, int debug,
+                                         const libxl_domain_remus_info
*r_info);
 _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t
domid);
 _hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid);
 _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl	Thu May 17 12:37:05 2012 -0700
+++ b/tools/libxl/libxl_types.idl	Thu May 17 12:37:07 2012 -0700
@@ -454,6 +454,12 @@
     ("weight", integer),
     ])
 
+libxl_domain_remus_info = Struct("domain_remus_info",[
+    ("interval",     integer),
+    ("blackhole",    bool),
+    ("compression",  bool),
+    ])
+
 libxl_event_type = Enumeration("event_type", [
     (1, "DOMAIN_SHUTDOWN"),
     (2, "DOMAIN_DEATH"),

Shriram Rajagopalan

2012-May-17 19:48 UTC

head link

[PATCH 2 of 2 V6] libxl: Remus - xl remus command

# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1337283430 25200
# Node ID 92bf8bd9ae5783a8126ffae75da9425db7c6e3d0
# Parent  496ff6ce5bb63a2f034d2a861f34cfa8cbf06552
libxl: Remus - xl remus command

xl remus acts as a frontend to enable remus for a given domain.
 * At the moment, only memory checkpointing and blackhole replication is
   supported. Support for disk checkpointing and network buffering will
   be added in future.
 * Replication is done over ssh connection currently (like live migration
   with xl). Future versions will have an option to use simple tcp socket
   based replication channel (for both Remus & live migration).

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1
--- a/docs/man/xl.pod.1	Thu May 17 12:37:07 2012 -0700
+++ b/docs/man/xl.pod.1	Thu May 17 12:37:10 2012 -0700
@@ -381,6 +381,41 @@
 
 =back
 
+=item B<remus> [I<OPTIONS>] I<domain-id> I<host>
+
+Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
+mechanism between the two hosts.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-i> I<MS>
+
+Checkpoint domain memory every MS milliseconds (default 200ms).
+
+=item B<-b>
+
+Do not checkpoint the disk. Replicate memory checkpoints to /dev/null
+(blackhole).  Network output buffering remains enabled (unless --no-net is
+supplied).  Generally useful for debugging.
+
+=item B<-u>
+
+Disable memory checkpoint compression.
+
+=item B<-s> I<sshcommand>
+
+Use <sshcommand> instead of ssh.  String will be passed to sh.
+If empty, run <host> instead of ssh <host> xl migrate-receive -r
[-e].
+
+=item B<-e>
+
+On the new host, do not wait in the background (on <host>) for the death
+of the domain. See the corresponding option of the I<create> subcommand.
+
+=back
+
 =item B<pause> I<domain-id>
 
 Pause a domain.  When in a paused state the domain will still consume
diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 tools/libxl/xl.h
--- a/tools/libxl/xl.h	Thu May 17 12:37:07 2012 -0700
+++ b/tools/libxl/xl.h	Thu May 17 12:37:10 2012 -0700
@@ -95,6 +95,7 @@
 int main_getenforce(int argc, char **argv);
 int main_setenforce(int argc, char **argv);
 int main_loadpolicy(int argc, char **argv);
+int main_remus(int argc, char **argv);
 
 void help(const char *command);
 
diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Thu May 17 12:37:07 2012 -0700
+++ b/tools/libxl/xl_cmdimpl.c	Thu May 17 12:37:10 2012 -0700
@@ -2966,7 +2966,7 @@
 }
 
 static void migrate_receive(int debug, int daemonize, int monitor,
-                            int send_fd, int recv_fd)
+                            int send_fd, int recv_fd, int remus)
 {
     int rc, rc2;
     char rc_buf;
@@ -3001,6 +3001,41 @@
         exit(-rc);
     }
 
+    if (remus) {
+        /* If we are here, it means that the sender (primary) has crashed.
+         * TODO: Split-Brain Check.
+         */
+        fprintf(stderr, "migration target: Remus Failover for domain
%u\n",
+                domid);
+
+        /*
+         * If domain renaming fails, lets just continue (as we need the domain
+         * to be up & dom names may not matter much, as long as its
reachable
+         * over network).
+         *
+         * If domain unpausing fails, destroy domain ? Or is it better to have
+         * a consistent copy of the domain (memory, cpu state, disk)
+         * on atleast one physical host ? Right now, lets just leave the domain
+         * as is and let the Administrator decide (or troubleshoot).
+         */
+        if (migration_domname) {
+            rc = libxl_domain_rename(ctx, domid, migration_domname,
+                                     common_domname);
+            if (rc)
+                fprintf(stderr, "migration target (Remus): "
+                        "Failed to rename domain from %s to %s:%d\n",
+                        migration_domname, common_domname, rc);
+        }
+
+        rc = libxl_domain_unpause(ctx, domid);
+        if (rc)
+            fprintf(stderr, "migration target (Remus): "
+                    "Failed to unpause domain %s (id: %u):%d\n",
+                    common_domname, domid, rc);
+
+        exit(rc ? -ERROR_FAIL: 0);
+    }
+
     fprintf(stderr, "migration target: Transfer complete,"
             " requesting permission to start domain.\n");
 
@@ -3128,10 +3163,10 @@
 
 int main_migrate_receive(int argc, char **argv)
 {
-    int debug = 0, daemonize = 1, monitor = 1;
+    int debug = 0, daemonize = 1, monitor = 1, remus = 0;
     int opt;
 
-    while ((opt = def_getopt(argc, argv, "Fed",
"migrate-receive", 0)) != -1) {
+    while ((opt = def_getopt(argc, argv, "Fedr",
"migrate-receive", 0)) != -1) {
         switch (opt) {
         case 0: case 2:
             return opt;
@@ -3145,6 +3180,9 @@
         case ''d'':
             debug = 1;
             break;
+        case ''r'':
+            remus = 1;
+            break;
         }
     }
 
@@ -3153,7 +3191,8 @@
         return 2;
     }
     migrate_receive(debug, daemonize, monitor,
-                    STDOUT_FILENO, STDIN_FILENO);
+                    STDOUT_FILENO, STDIN_FILENO,
+                    remus);
 
     return 0;
 }
@@ -6315,6 +6354,102 @@
     return ret;
 }
 
+int main_remus(int argc, char **argv)
+{
+    int opt, rc, daemonize = 1;
+    const char *ssh_command = "ssh";
+    char *host = NULL, *rune = NULL, *domain = NULL;
+    libxl_domain_remus_info r_info;
+    int send_fd = -1, recv_fd = -1;
+    pid_t child = -1;
+    uint8_t *config_data;
+    int config_len;
+
+    memset(&r_info, 0, sizeof(libxl_domain_remus_info));
+    /* Defaults */
+    r_info.interval = 200;
+    r_info.blackhole = 0;
+    r_info.compression = 1;
+
+    while ((opt = def_getopt(argc, argv, "bui:s:e",
"remus", 2)) != -1) {
+        switch (opt) {
+        case 0: case 2:
+            return opt;
+
+        case ''i'':
+	    r_info.interval = atoi(optarg);
+            break;
+        case ''b'':
+            r_info.blackhole = 1;
+            break;
+        case ''u'':
+	    r_info.compression = 0;
+            break;
+        case ''s'':
+            ssh_command = optarg;
+            break;
+        case ''e'':
+            daemonize = 0;
+            break;
+        }
+    }
+
+    domain = argv[optind];
+    host = argv[optind + 1];
+
+    if (r_info.blackhole) {
+        find_domain(domain);
+        send_fd = open("/dev/null", O_RDWR, 0644);
+        if (send_fd < 0) {
+            perror("failed to open /dev/null");
+            exit(-1);
+        }
+    } else {
+
+        if (!ssh_command[0]) {
+            rune = host;
+        } else {
+            if (asprintf(&rune, "exec %s %s xl migrate-receive -r
%s",
+                         ssh_command, host,
+                         daemonize ? "" : " -e") < 0)
+                return 1;
+        }
+
+        save_domain_core_begin(domain, NULL, &config_data,
&config_len);
+
+        if (!config_len) {
+            fprintf(stderr, "No config file stored for running domain and
"
+                    "none supplied - cannot start remus.\n");
+            exit(1);
+        }
+
+        child = create_migration_child(rune, &send_fd, &recv_fd);
+
+        migrate_do_preamble(send_fd, recv_fd, child, config_data, config_len,
+                            rune);
+    }
+
+    /* Point of no return */
+    rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd);
+
+    /* If we are here, it means backup has failed/domain suspend failed.
+     * Try to resume the domain and exit gracefully.
+     * TODO: Split-Brain check.
+     */
+    fprintf(stderr, "remus sender: libxl_domain_suspend failed"
+            " (rc=%d)\n", rc);
+
+    if (rc == ERROR_GUEST_TIMEDOUT)
+        fprintf(stderr, "Failed to suspend domain at primary.\n");
+    else {
+        fprintf(stderr, "Remus: Backup failed? resuming domain at
primary.\n");
+        libxl_domain_resume(ctx, domid, 1);
+    }
+
+    close(send_fd);
+    return -ERROR_FAIL;
+}
+
 /*
  * Local variables:
  * mode: C
diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c	Thu May 17 12:37:07 2012 -0700
+++ b/tools/libxl/xl_cmdtable.c	Thu May 17 12:37:10 2012 -0700
@@ -427,6 +427,20 @@
       "Loads a new policy int the Flask Xen security module",
       "<policy file>",
     },
+    { "remus",
+      &main_remus, 0, 1,
+      "Enable Remus HA for domain",
+      "[options] <Domain> [<host>]",
+      "-i MS                   Checkpoint domain memory every MS
milliseconds (def. 200ms).\n"
+      "-b                      Replicate memory checkpoints to /dev/null
(blackhole)\n"
+      "-u                      Disable memory checkpoint
compression.\n"
+      "-s <sshcommand>         Use <sshcommand> instead of
ssh.  String will be passed\n"
+      "                        to sh. If empty, run <host> instead
of \n"
+      "                        ssh <host> xl migrate-receive -r
[-e]\n"
+      "-e                      Do not wait in the background (on
<host>) for the death\n"
+      "                        of the domain."
+
+    },
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);

Ian Campbell

2012-May-25 16:59 UTC

head link

Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command

On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan
wrote:> diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1
> --- a/docs/man/xl.pod.1	Thu May 17 12:37:07 2012 -0700
> +++ b/docs/man/xl.pod.1	Thu May 17 12:37:10 2012 -0700
> @@ -381,6 +381,41 @@
>  
>  =back
>  
> +=item B<remus> [I<OPTIONS>] I<domain-id> I<host>
> +
> +Enable Remus HA for domain. By default B<xl> relies on ssh as a
transport
> +mechanism between the two hosts.
> +
> [...]
> +
> +=item B<-b>
> +
> +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null
> +(blackhole).  Network output buffering remains enabled (unless --no-net is
> +supplied).  Generally useful for debugging.
Unless I''m mistaken the current remus support in (lib)xl
doesn''t
implement either disk or networking replication (and --no-net doesn''t
seem to exist), at least there as several TODOs to that effect in the
code.

Please can you send an incremental patch which corrects this.

I also think it would be worth mentioning in the intro that "xl remus"
as it stands is "proof-of-concept" or "early preview",
"experimental" or
something along these lines, otherwise people will expect it to be a
complete solution, which it isn''t.

More importantly I think the lack of STONITH functionality should be
highlighted, since it would be rather dangerous to deploy remus without
it.

Ian.

Shriram Rajagopalan

2012-May-28 00:39 UTC

head link

Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command

On Fri, May 25, 2012 at 12:59 PM, Ian Campbell
<Ian.Campbell@citrix.com>wrote:
> On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote:
> > diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1
> > --- a/docs/man/xl.pod.1       Thu May 17 12:37:07 2012 -0700
> > +++ b/docs/man/xl.pod.1       Thu May 17 12:37:10 2012 -0700
> > @@ -381,6 +381,41 @@
> >
> >  =back
> >
> > +=item B<remus> [I<OPTIONS>] I<domain-id>
I<host>
> > +
> > +Enable Remus HA for domain. By default B<xl> relies on ssh as a
> transport
> > +mechanism between the two hosts.
> > +
> > [...]
>
> > +
> > +=item B<-b>
> > +
> > +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null
> > +(blackhole).  Network output buffering remains enabled (unless
--no-net
> is
> > +supplied).  Generally useful for debugging.
>
> Unless I''m mistaken the current remus support in (lib)xl
doesn''t
> implement either disk or networking replication (and --no-net
doesn''t
> seem to exist), at least there as several TODOs to that effect in the
> code.
>
> Please can you send an incremental patch which corrects this.
>
> I also think it would be worth mentioning in the intro that "xl
remus"
> as it stands is "proof-of-concept" or "early preview",
"experimental" or
> something along these lines, otherwise people will expect it to be a
> complete solution, which it isn''t.
>
>Sorry about that. I ll send out a patch. I had actually planned on some
network buffering support but didnt expect the initial framework patches
to get held up for so long. :(. In fact, even the network buffering module
is
has been available in mainline kernel (with libnl library support), for the
past 3 months.
But I guess its too late now.

> More importantly I think the lack of STONITH functionality should be
> highlighted, since it would be rather dangerous to deploy remus without
> it.
> heart
>

I think this applies to both xend/xl. Remus traditionally has not had any
stonith functionality. And if you think about it, separating Remus from the
Failover Arbitration (STONITH) gives more flexibility
(e.g., kill Backup, in case replication was interrupted by some spurious
timeout,
use custom or off-the-shelf stonith solutions, etc).

The only thing that was lacking is some sort of notification to an external
handler.
For e.g., on suspected failure, both nodes could invoke some FooBar.sh
script which
would return 0/1 (die/live) and act accordingly.  The onus is on the user
who implements
the FooBar.sh script, to ensure that it doesnt return 1 on both sides. :).

In fact, I think I have a patch lying around somewhere, that invokes an
arbitration
script, which in turn talks to a Google App engine instance. This was done
for
wide-area Remus paper.

Let me post that too.

thanks
shriram

Ian.>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-May-28 08:41 UTC

head link

Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command

On Mon, 2012-05-28 at 01:39 +0100, Shriram Rajagopalan
wrote:> On Fri, May 25, 2012 at 12:59 PM, Ian Campbell
<Ian.Campbell@citrix.com> wrote:
>         On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote:
>         > diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1
>         > --- a/docs/man/xl.pod.1       Thu May 17 12:37:07 2012 -0700
>         > +++ b/docs/man/xl.pod.1       Thu May 17 12:37:10 2012 -0700
>         > @@ -381,6 +381,41 @@
>         >
>         >  =back
>         >
>         > +=item B<remus> [I<OPTIONS>] I<domain-id>
I<host>
>         > +
>         > +Enable Remus HA for domain. By default B<xl> relies on
ssh as a transport
>         > +mechanism between the two hosts.
>         > +
>         
>         > [...]
>         
>         > +
>         > +=item B<-b>
>         > +
>         > +Do not checkpoint the disk. Replicate memory checkpoints to
/dev/null
>         > +(blackhole).  Network output buffering remains enabled
(unless --no-net is
>         > +supplied).  Generally useful for debugging.
>         
>         
>         Unless I''m mistaken the current remus support in (lib)xl
doesn''t
>         implement either disk or networking replication (and --no-net
doesn''t
>         seem to exist), at least there as several TODOs to that effect in
the
>         code.
>         
>         Please can you send an incremental patch which corrects this.
>         
>         I also think it would be worth mentioning in the intro that
"xl remus"
>         as it stands is "proof-of-concept" or "early
preview", "experimental" or
>         something along these lines, otherwise people will expect it to be
a
>         complete solution, which it isn''t.
>         
> 
> Sorry about that. I ll send out a patch.
Thanks.
>  I had actually planned on some
> network buffering support but didnt expect the initial framework patches
> to get held up for so long. :(. In fact, even the network buffering module
is
> has been available in mainline kernel (with libnl library support), for the
past 3 months.
> But I guess its too late now.
Yes, I''m afraid so, although that needn''t stop you posting
RFCs for 4.3.
>  
>         More importantly I think the lack of STONITH functionality should
be
>         highlighted, since it would be rather dangerous to deploy remus
without
>         it.
>         heart
> 
> 
> I think this applies to both xend/xl. Remus traditionally has not had any
> stonith functionality. And if you think about it, separating Remus from the
> Failover Arbitration (STONITH) gives more flexibility 
> (e.g., kill Backup, in case replication was interrupted by some spurious
timeout,
> use custom or off-the-shelf stonith solutions, etc).
So it sound like some documentation is required for what you need to
build around the xm/xl remus support in order to have a fully functional
& safe system? Does anything like that exist? It doesn''t seem to be
mentioned in http://nss.cs.ubc.ca/remus/doc.html. Could we add something
into the tree or at least add a pointer to something? The need for this
should also be highlighted in the xl man page I think, otherwise people
will think that all they need to do is run "xl remus", which they
could
be forgiven for thinking after having read
http://nss.cs.ubc.ca/remus/doc.html.
> The only thing that was lacking is some sort of notification to an external
handler.
> For e.g., on suspected failure, both nodes could invoke some FooBar.sh
script which
> would return 0/1 (die/live) and act accordingly.  The onus is on the user
who implements
> the FooBar.sh script, to ensure that it doesnt return 1 on both sides. :).
> 
> In fact, I think I have a patch lying around somewhere, that invokes an
arbitration
> script, which in turn talks to a Google App engine instance. This was done
for
> wide-area Remus paper.
> 
> Let me post that too.
Please.

Thanks,
Ian.

Xen devel - May 2012 - [PATCH 0 of 2 V6] libxl: Remus support

[PATCH 0 of 2 V6] libxl: Remus support

[PATCH 1 of 2 V6] libxl: Remus - suspend/postflush/commit callbacks

[PATCH 2 of 2 V6] libxl: Remus - xl remus command

Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command

Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command

Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command