This patch series introduces a basic framework to incorporate Remus into the libxl toolstack. The only functionality currently implemented is memory checkpointing. These patches depend on the "libxl: refactor suspend/resume code" patch series. Changes in V6: * Rebase to current tip Changes in V5: * Rebase to current tip Changes in V4: * more explanation on blackhole replication in xl.pod * moved comment on save_callbacks to xenguest.h * rebased to current tip, removed useless comments. Changes in V3: * Rebased w.r.t Stefano''s patches. Changes in V2: * Move libxl_domain_remus_start into the save_callbacks implementation patch * return proper error codes instead of -1. * Add documentation to docs/man/xl.pod.1 Shriram
Shriram Rajagopalan
2012-May-17 19:48 UTC
[PATCH 1 of 2 V6] libxl: Remus - suspend/postflush/commit callbacks
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1337283427 25200 # Node ID 496ff6ce5bb63a2f034d2a861f34cfa8cbf06552 # Parent 24c462a07e167e4ce35a22197dbef74853b08359 libxl: Remus - suspend/postflush/commit callbacks * Add libxl callback functions for Remus checkpoint suspend, postflush (aka resume) and checkpoint commit callbacks. * suspend callback is a stub that just bounces off libxl__domain_suspend_common_callback - which suspends the domain and saves the devices model state to a file. * resume callback currently just resumes the domain (and the device model). * commit callback just writes out the saved device model state to the network and sleeps for the checkpoint interval. * Introduce a new public API, libxl_domain_remus_start (currently a stub) that sets up the network and disk buffer and initiates continuous checkpointing. * Future patches will augment these callbacks/functions with more functionalities like issuing network buffer plug/unplug commands, disk checkpoint commands, etc. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <ian.campbell@citrix.com> diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxc/xenguest.h --- a/tools/libxc/xenguest.h Thu May 17 12:37:05 2012 -0700 +++ b/tools/libxc/xenguest.h Thu May 17 12:37:07 2012 -0700 @@ -33,10 +33,29 @@ /* callbacks provided by xc_domain_save */ struct save_callbacks { + /* Called after expiration of checkpoint interval, + * to suspend the guest. + */ int (*suspend)(void* data); - /* callback to rendezvous with external checkpoint functions */ + + /* Called after the guest''s dirty pages have been + * copied into an output buffer. + * Callback function resumes the guest & the device model, + * returns to xc_domain_save. + * xc_domain_save then flushes the output buffer, while the + * guest continues to run. + */ int (*postcopy)(void* data); - /* returns: + + /* Called after the memory checkpoint has been flushed + * out into the network. Typical actions performed in this + * callback include: + * (a) send the saved device model state (for HVM guests), + * (b) wait for checkpoint ack + * (c) release the network output buffer pertaining to the acked checkpoint. + * (c) sleep for the checkpoint interval. + * + * returns: * 0: terminate checkpointing gracefully * 1: take another checkpoint */ int (*checkpoint)(void* data); diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl.c --- a/tools/libxl/libxl.c Thu May 17 12:37:05 2012 -0700 +++ b/tools/libxl/libxl.c Thu May 17 12:37:07 2012 -0700 @@ -619,6 +619,41 @@ return ptr; } +/* TODO: Explicit Checkpoint acknowledgements via recv_fd. */ +int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, + uint32_t domid, int send_fd, int recv_fd) +{ + GC_INIT(ctx); + libxl_domain_type type = libxl__domain_type(gc, domid); + int rc = 0; + + if (info == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, + "No remus_info structure supplied for domain %d", domid); + rc = ERROR_INVAL; + goto remus_fail; + } + + /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */ + + /* Point of no return */ + rc = libxl__domain_suspend_common(gc, domid, send_fd, type, /* live */ 1, + /* debug */ 0, info); + + /* + * With Remus, if we reach this point, it means either + * backup died or some network error occurred preventing us + * from sending checkpoints. + */ + + /* TBD: Remus cleanup - i.e. detach qdisc, release other + * resources. + */ + remus_fail: + GC_FREE; + return rc; +} + int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info, uint32_t domid, int fd) { @@ -628,7 +663,9 @@ int debug = info != NULL && info->flags & XL_SUSPEND_DEBUG; int rc = 0; - rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug); + rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug, + /* No Remus */ NULL); + if (!rc && type == LIBXL_DOMAIN_TYPE_HVM) rc = libxl__domain_save_device_model(gc, domid, fd); GC_FREE; diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl.h --- a/tools/libxl/libxl.h Thu May 17 12:37:05 2012 -0700 +++ b/tools/libxl/libxl.h Thu May 17 12:37:07 2012 -0700 @@ -525,6 +525,8 @@ void libxl_domain_config_init(libxl_domain_config *d_config); void libxl_domain_config_dispose(libxl_domain_config *d_config); +int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, + uint32_t domid, int send_fd, int recv_fd); int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info, uint32_t domid, int fd); diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl_dom.c --- a/tools/libxl/libxl_dom.c Thu May 17 12:37:05 2012 -0700 +++ b/tools/libxl/libxl_dom.c Thu May 17 12:37:07 2012 -0700 @@ -566,6 +566,8 @@ int hvm; unsigned int flags; int guest_responded; + int save_fd; /* Migration stream fd (for Remus) */ + int interval; /* checkpoint interval (for Remus) */ }; static int libxl__domain_suspend_common_switch_qemu_logdirty(int domid, unsigned int enable, void *data) @@ -848,9 +850,43 @@ return 0; } +static int libxl__remus_domain_suspend_callback(void *data) +{ + /* TODO: Issue disk and network checkpoint reqs. */ + return libxl__domain_suspend_common_callback(data); +} + +static int libxl__remus_domain_resume_callback(void *data) +{ + struct suspendinfo *si = data; + libxl_ctx *ctx = libxl__gc_owner(si->gc); + + /* Resumes the domain and the device model */ + if (libxl_domain_resume(ctx, si->domid, /* Fast Suspend */1)) + return 0; + + /* TODO: Deal with disk. Start a new network output buffer */ + return 1; +} + +static int libxl__remus_domain_checkpoint_callback(void *data) +{ + struct suspendinfo *si = data; + + /* This would go into tailbuf. */ + if (si->hvm && + libxl__domain_save_device_model(si->gc, si->domid, si->save_fd)) + return 0; + + /* TODO: Wait for disk and memory ack, release network buffer */ + usleep(si->interval * 1000); + return 1; +} + int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd, libxl_domain_type type, - int live, int debug) + int live, int debug, + const libxl_domain_remus_info *r_info) { libxl_ctx *ctx = libxl__gc_owner(gc); int flags; @@ -881,10 +917,20 @@ return ERROR_INVAL; } + memset(&si, 0, sizeof(si)); flags = (live) ? XCFLAGS_LIVE : 0 | (debug) ? XCFLAGS_DEBUG : 0 | (hvm) ? XCFLAGS_HVM : 0; + if (r_info != NULL) { + si.interval = r_info->interval; + if (r_info->compression) + flags |= XCFLAGS_CHECKPOINT_COMPRESS; + si.save_fd = fd; + } + else + si.save_fd = -1; + si.domid = domid; si.flags = flags; si.hvm = hvm; @@ -908,7 +954,13 @@ } memset(&callbacks, 0, sizeof(callbacks)); - callbacks.suspend = libxl__domain_suspend_common_callback; + if (r_info != NULL) { + callbacks.suspend = libxl__remus_domain_suspend_callback; + callbacks.postcopy = libxl__remus_domain_resume_callback; + callbacks.checkpoint = libxl__remus_domain_checkpoint_callback; + } else + callbacks.suspend = libxl__domain_suspend_common_callback; + callbacks.switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; callbacks.toolstack_save = libxl__toolstack_save; callbacks.data = &si; diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl_internal.h --- a/tools/libxl/libxl_internal.h Thu May 17 12:37:05 2012 -0700 +++ b/tools/libxl/libxl_internal.h Thu May 17 12:37:07 2012 -0700 @@ -757,7 +757,8 @@ int fd); _hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd, libxl_domain_type type, - int live, int debug); + int live, int debug, + const libxl_domain_remus_info *r_info); _hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid); _hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid); _hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid); diff -r 24c462a07e16 -r 496ff6ce5bb6 tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl Thu May 17 12:37:05 2012 -0700 +++ b/tools/libxl/libxl_types.idl Thu May 17 12:37:07 2012 -0700 @@ -454,6 +454,12 @@ ("weight", integer), ]) +libxl_domain_remus_info = Struct("domain_remus_info",[ + ("interval", integer), + ("blackhole", bool), + ("compression", bool), + ]) + libxl_event_type = Enumeration("event_type", [ (1, "DOMAIN_SHUTDOWN"), (2, "DOMAIN_DEATH"),
Shriram Rajagopalan
2012-May-17 19:48 UTC
[PATCH 2 of 2 V6] libxl: Remus - xl remus command
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1337283430 25200 # Node ID 92bf8bd9ae5783a8126ffae75da9425db7c6e3d0 # Parent 496ff6ce5bb63a2f034d2a861f34cfa8cbf06552 libxl: Remus - xl remus command xl remus acts as a frontend to enable remus for a given domain. * At the moment, only memory checkpointing and blackhole replication is supported. Support for disk checkpointing and network buffering will be added in future. * Replication is done over ssh connection currently (like live migration with xl). Future versions will have an option to use simple tcp socket based replication channel (for both Remus & live migration). Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <ian.campbell@citrix.com> diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1 --- a/docs/man/xl.pod.1 Thu May 17 12:37:07 2012 -0700 +++ b/docs/man/xl.pod.1 Thu May 17 12:37:10 2012 -0700 @@ -381,6 +381,41 @@ =back +=item B<remus> [I<OPTIONS>] I<domain-id> I<host> + +Enable Remus HA for domain. By default B<xl> relies on ssh as a transport +mechanism between the two hosts. + +B<OPTIONS> + +=over 4 + +=item B<-i> I<MS> + +Checkpoint domain memory every MS milliseconds (default 200ms). + +=item B<-b> + +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null +(blackhole). Network output buffering remains enabled (unless --no-net is +supplied). Generally useful for debugging. + +=item B<-u> + +Disable memory checkpoint compression. + +=item B<-s> I<sshcommand> + +Use <sshcommand> instead of ssh. String will be passed to sh. +If empty, run <host> instead of ssh <host> xl migrate-receive -r [-e]. + +=item B<-e> + +On the new host, do not wait in the background (on <host>) for the death +of the domain. See the corresponding option of the I<create> subcommand. + +=back + =item B<pause> I<domain-id> Pause a domain. When in a paused state the domain will still consume diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 tools/libxl/xl.h --- a/tools/libxl/xl.h Thu May 17 12:37:07 2012 -0700 +++ b/tools/libxl/xl.h Thu May 17 12:37:10 2012 -0700 @@ -95,6 +95,7 @@ int main_getenforce(int argc, char **argv); int main_setenforce(int argc, char **argv); int main_loadpolicy(int argc, char **argv); +int main_remus(int argc, char **argv); void help(const char *command); diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Thu May 17 12:37:07 2012 -0700 +++ b/tools/libxl/xl_cmdimpl.c Thu May 17 12:37:10 2012 -0700 @@ -2966,7 +2966,7 @@ } static void migrate_receive(int debug, int daemonize, int monitor, - int send_fd, int recv_fd) + int send_fd, int recv_fd, int remus) { int rc, rc2; char rc_buf; @@ -3001,6 +3001,41 @@ exit(-rc); } + if (remus) { + /* If we are here, it means that the sender (primary) has crashed. + * TODO: Split-Brain Check. + */ + fprintf(stderr, "migration target: Remus Failover for domain %u\n", + domid); + + /* + * If domain renaming fails, lets just continue (as we need the domain + * to be up & dom names may not matter much, as long as its reachable + * over network). + * + * If domain unpausing fails, destroy domain ? Or is it better to have + * a consistent copy of the domain (memory, cpu state, disk) + * on atleast one physical host ? Right now, lets just leave the domain + * as is and let the Administrator decide (or troubleshoot). + */ + if (migration_domname) { + rc = libxl_domain_rename(ctx, domid, migration_domname, + common_domname); + if (rc) + fprintf(stderr, "migration target (Remus): " + "Failed to rename domain from %s to %s:%d\n", + migration_domname, common_domname, rc); + } + + rc = libxl_domain_unpause(ctx, domid); + if (rc) + fprintf(stderr, "migration target (Remus): " + "Failed to unpause domain %s (id: %u):%d\n", + common_domname, domid, rc); + + exit(rc ? -ERROR_FAIL: 0); + } + fprintf(stderr, "migration target: Transfer complete," " requesting permission to start domain.\n"); @@ -3128,10 +3163,10 @@ int main_migrate_receive(int argc, char **argv) { - int debug = 0, daemonize = 1, monitor = 1; + int debug = 0, daemonize = 1, monitor = 1, remus = 0; int opt; - while ((opt = def_getopt(argc, argv, "Fed", "migrate-receive", 0)) != -1) { + while ((opt = def_getopt(argc, argv, "Fedr", "migrate-receive", 0)) != -1) { switch (opt) { case 0: case 2: return opt; @@ -3145,6 +3180,9 @@ case ''d'': debug = 1; break; + case ''r'': + remus = 1; + break; } } @@ -3153,7 +3191,8 @@ return 2; } migrate_receive(debug, daemonize, monitor, - STDOUT_FILENO, STDIN_FILENO); + STDOUT_FILENO, STDIN_FILENO, + remus); return 0; } @@ -6315,6 +6354,102 @@ return ret; } +int main_remus(int argc, char **argv) +{ + int opt, rc, daemonize = 1; + const char *ssh_command = "ssh"; + char *host = NULL, *rune = NULL, *domain = NULL; + libxl_domain_remus_info r_info; + int send_fd = -1, recv_fd = -1; + pid_t child = -1; + uint8_t *config_data; + int config_len; + + memset(&r_info, 0, sizeof(libxl_domain_remus_info)); + /* Defaults */ + r_info.interval = 200; + r_info.blackhole = 0; + r_info.compression = 1; + + while ((opt = def_getopt(argc, argv, "bui:s:e", "remus", 2)) != -1) { + switch (opt) { + case 0: case 2: + return opt; + + case ''i'': + r_info.interval = atoi(optarg); + break; + case ''b'': + r_info.blackhole = 1; + break; + case ''u'': + r_info.compression = 0; + break; + case ''s'': + ssh_command = optarg; + break; + case ''e'': + daemonize = 0; + break; + } + } + + domain = argv[optind]; + host = argv[optind + 1]; + + if (r_info.blackhole) { + find_domain(domain); + send_fd = open("/dev/null", O_RDWR, 0644); + if (send_fd < 0) { + perror("failed to open /dev/null"); + exit(-1); + } + } else { + + if (!ssh_command[0]) { + rune = host; + } else { + if (asprintf(&rune, "exec %s %s xl migrate-receive -r %s", + ssh_command, host, + daemonize ? "" : " -e") < 0) + return 1; + } + + save_domain_core_begin(domain, NULL, &config_data, &config_len); + + if (!config_len) { + fprintf(stderr, "No config file stored for running domain and " + "none supplied - cannot start remus.\n"); + exit(1); + } + + child = create_migration_child(rune, &send_fd, &recv_fd); + + migrate_do_preamble(send_fd, recv_fd, child, config_data, config_len, + rune); + } + + /* Point of no return */ + rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd); + + /* If we are here, it means backup has failed/domain suspend failed. + * Try to resume the domain and exit gracefully. + * TODO: Split-Brain check. + */ + fprintf(stderr, "remus sender: libxl_domain_suspend failed" + " (rc=%d)\n", rc); + + if (rc == ERROR_GUEST_TIMEDOUT) + fprintf(stderr, "Failed to suspend domain at primary.\n"); + else { + fprintf(stderr, "Remus: Backup failed? resuming domain at primary.\n"); + libxl_domain_resume(ctx, domid, 1); + } + + close(send_fd); + return -ERROR_FAIL; +} + /* * Local variables: * mode: C diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 tools/libxl/xl_cmdtable.c --- a/tools/libxl/xl_cmdtable.c Thu May 17 12:37:07 2012 -0700 +++ b/tools/libxl/xl_cmdtable.c Thu May 17 12:37:10 2012 -0700 @@ -427,6 +427,20 @@ "Loads a new policy int the Flask Xen security module", "<policy file>", }, + { "remus", + &main_remus, 0, 1, + "Enable Remus HA for domain", + "[options] <Domain> [<host>]", + "-i MS Checkpoint domain memory every MS milliseconds (def. 200ms).\n" + "-b Replicate memory checkpoints to /dev/null (blackhole)\n" + "-u Disable memory checkpoint compression.\n" + "-s <sshcommand> Use <sshcommand> instead of ssh. String will be passed\n" + " to sh. If empty, run <host> instead of \n" + " ssh <host> xl migrate-receive -r [-e]\n" + "-e Do not wait in the background (on <host>) for the death\n" + " of the domain." + + }, }; int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote:> diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1 > --- a/docs/man/xl.pod.1 Thu May 17 12:37:07 2012 -0700 > +++ b/docs/man/xl.pod.1 Thu May 17 12:37:10 2012 -0700 > @@ -381,6 +381,41 @@ > > =back > > +=item B<remus> [I<OPTIONS>] I<domain-id> I<host> > + > +Enable Remus HA for domain. By default B<xl> relies on ssh as a transport > +mechanism between the two hosts. > + > [...]> + > +=item B<-b> > + > +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null > +(blackhole). Network output buffering remains enabled (unless --no-net is > +supplied). Generally useful for debugging.Unless I''m mistaken the current remus support in (lib)xl doesn''t implement either disk or networking replication (and --no-net doesn''t seem to exist), at least there as several TODOs to that effect in the code. Please can you send an incremental patch which corrects this. I also think it would be worth mentioning in the intro that "xl remus" as it stands is "proof-of-concept" or "early preview", "experimental" or something along these lines, otherwise people will expect it to be a complete solution, which it isn''t. More importantly I think the lack of STONITH functionality should be highlighted, since it would be rather dangerous to deploy remus without it. Ian.
Shriram Rajagopalan
2012-May-28 00:39 UTC
Re: [PATCH 2 of 2 V6] libxl: Remus - xl remus command
On Fri, May 25, 2012 at 12:59 PM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote: > > diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1 > > --- a/docs/man/xl.pod.1 Thu May 17 12:37:07 2012 -0700 > > +++ b/docs/man/xl.pod.1 Thu May 17 12:37:10 2012 -0700 > > @@ -381,6 +381,41 @@ > > > > =back > > > > +=item B<remus> [I<OPTIONS>] I<domain-id> I<host> > > + > > +Enable Remus HA for domain. By default B<xl> relies on ssh as a > transport > > +mechanism between the two hosts. > > + > > [...] > > > + > > +=item B<-b> > > + > > +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null > > +(blackhole). Network output buffering remains enabled (unless --no-net > is > > +supplied). Generally useful for debugging. > > Unless I''m mistaken the current remus support in (lib)xl doesn''t > implement either disk or networking replication (and --no-net doesn''t > seem to exist), at least there as several TODOs to that effect in the > code. > > Please can you send an incremental patch which corrects this. > > I also think it would be worth mentioning in the intro that "xl remus" > as it stands is "proof-of-concept" or "early preview", "experimental" or > something along these lines, otherwise people will expect it to be a > complete solution, which it isn''t. > >Sorry about that. I ll send out a patch. I had actually planned on some network buffering support but didnt expect the initial framework patches to get held up for so long. :(. In fact, even the network buffering module is has been available in mainline kernel (with libnl library support), for the past 3 months. But I guess its too late now.> More importantly I think the lack of STONITH functionality should be > highlighted, since it would be rather dangerous to deploy remus without > it. > heart >I think this applies to both xend/xl. Remus traditionally has not had any stonith functionality. And if you think about it, separating Remus from the Failover Arbitration (STONITH) gives more flexibility (e.g., kill Backup, in case replication was interrupted by some spurious timeout, use custom or off-the-shelf stonith solutions, etc). The only thing that was lacking is some sort of notification to an external handler. For e.g., on suspected failure, both nodes could invoke some FooBar.sh script which would return 0/1 (die/live) and act accordingly. The onus is on the user who implements the FooBar.sh script, to ensure that it doesnt return 1 on both sides. :). In fact, I think I have a patch lying around somewhere, that invokes an arbitration script, which in turn talks to a Google App engine instance. This was done for wide-area Remus paper. Let me post that too. thanks shriram Ian.> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, 2012-05-28 at 01:39 +0100, Shriram Rajagopalan wrote:> On Fri, May 25, 2012 at 12:59 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Thu, 2012-05-17 at 20:48 +0100, Shriram Rajagopalan wrote: > > diff -r 496ff6ce5bb6 -r 92bf8bd9ae57 docs/man/xl.pod.1 > > --- a/docs/man/xl.pod.1 Thu May 17 12:37:07 2012 -0700 > > +++ b/docs/man/xl.pod.1 Thu May 17 12:37:10 2012 -0700 > > @@ -381,6 +381,41 @@ > > > > =back > > > > +=item B<remus> [I<OPTIONS>] I<domain-id> I<host> > > + > > +Enable Remus HA for domain. By default B<xl> relies on ssh as a transport > > +mechanism between the two hosts. > > + > > > [...] > > > + > > +=item B<-b> > > + > > +Do not checkpoint the disk. Replicate memory checkpoints to /dev/null > > +(blackhole). Network output buffering remains enabled (unless --no-net is > > +supplied). Generally useful for debugging. > > > Unless I''m mistaken the current remus support in (lib)xl doesn''t > implement either disk or networking replication (and --no-net doesn''t > seem to exist), at least there as several TODOs to that effect in the > code. > > Please can you send an incremental patch which corrects this. > > I also think it would be worth mentioning in the intro that "xl remus" > as it stands is "proof-of-concept" or "early preview", "experimental" or > something along these lines, otherwise people will expect it to be a > complete solution, which it isn''t. > > > Sorry about that. I ll send out a patch.Thanks.> I had actually planned on some > network buffering support but didnt expect the initial framework patches > to get held up for so long. :(. In fact, even the network buffering module is > has been available in mainline kernel (with libnl library support), for the past 3 months. > But I guess its too late now.Yes, I''m afraid so, although that needn''t stop you posting RFCs for 4.3.> > More importantly I think the lack of STONITH functionality should be > highlighted, since it would be rather dangerous to deploy remus without > it. > heart > > > I think this applies to both xend/xl. Remus traditionally has not had any > stonith functionality. And if you think about it, separating Remus from the > Failover Arbitration (STONITH) gives more flexibility > (e.g., kill Backup, in case replication was interrupted by some spurious timeout, > use custom or off-the-shelf stonith solutions, etc).So it sound like some documentation is required for what you need to build around the xm/xl remus support in order to have a fully functional & safe system? Does anything like that exist? It doesn''t seem to be mentioned in http://nss.cs.ubc.ca/remus/doc.html. Could we add something into the tree or at least add a pointer to something? The need for this should also be highlighted in the xl man page I think, otherwise people will think that all they need to do is run "xl remus", which they could be forgiven for thinking after having read http://nss.cs.ubc.ca/remus/doc.html.> The only thing that was lacking is some sort of notification to an external handler. > For e.g., on suspected failure, both nodes could invoke some FooBar.sh script which > would return 0/1 (die/live) and act accordingly. The onus is on the user who implements > the FooBar.sh script, to ensure that it doesnt return 1 on both sides. :). > > In fact, I think I have a patch lying around somewhere, that invokes an arbitration > script, which in turn talks to a Google App engine instance. This was done for > wide-area Remus paper. > > Let me post that too.Please. Thanks, Ian.