This patch series introduces a basic framework to incorporate
Remus into the libxl toolstack. The only functionality currently
implemented is memory checkpointing.
These patches depend on
1. "V3 libxl: refactor suspend/resume code" patch series (patches
1,2,3,5,6)
(note: patches 1 & 2 have already been committed and others have been
acked by Ian Campbell)
2. V4 of "libxl: support suspend_cancel in domain_resume"
(message id: 47366457a52076b78c52.1328837508@athos.nss.cs.ubc.ca)
3. Stefano's "V4 libxl: save/restore qemu physmap"
Changes in V4:
* more explanation on blackhole replication in xl.pod
* moved comment on save_callbacks to xenguest.h
* rebased to current tip, removed useless comments.
Changes in V3:
* Rebased w.r.t Stefano's patches.
Changes in V2:
* Move libxl_domain_remus_start into the save_callbacks implementation patch
* return proper error codes instead of -1.
* Add documentation to docs/man/xl.pod.1
shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
rshriram@cs.ubc.ca
2012-Feb-10 01:58 UTC
[PATCH 1 of 2 V4] libxl: Remus - suspend/postflush/commit callbacks
# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1328836395 28800
# Node ID 2cde33aba59badb3f2faa2d77fe5b18441c7f221
# Parent c94dbffb74b2e4b5ca8f9ad7ed13b3a3329b639c
libxl: Remus - suspend/postflush/commit callbacks
* Add libxl callback functions for Remus checkpoint suspend, postflush
(aka resume) and checkpoint commit callbacks.
* suspend callback is a stub that just bounces off
libxl__domain_suspend_common_callback - which suspends the domain and
saves the devices model state to a file.
* resume callback currently just resumes the domain (and the device model).
* commit callback just writes out the saved device model state to the
network and sleeps for the checkpoint interval.
* Introduce a new public API, libxl_domain_remus_start (currently a stub)
that sets up the network and disk buffer and initiates continuous
checkpointing.
* Future patches will augment these callbacks/functions with more
functionalities
like issuing network buffer plug/unplug commands, disk checkpoint commands,
etc.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
diff -r c94dbffb74b2 -r 2cde33aba59b tools/libxc/xenguest.h
--- a/tools/libxc/xenguest.h Thu Feb 09 16:54:37 2012 -0800
+++ b/tools/libxc/xenguest.h Thu Feb 09 17:13:15 2012 -0800
@@ -33,10 +33,29 @@
/* callbacks provided by xc_domain_save */
struct save_callbacks {
+ /* Called after expiration of checkpoint interval,
+ * to suspend the guest.
+ */
int (*suspend)(void* data);
- /* callback to rendezvous with external checkpoint functions */
+
+ /* Called after the guest''s dirty pages have been
+ * copied into an output buffer.
+ * Callback function resumes the guest & the device model,
+ * returns to xc_domain_save.
+ * xc_domain_save then flushes the output buffer, while the
+ * guest continues to run.
+ */
int (*postcopy)(void* data);
- /* returns:
+
+ /* Called after the memory checkpoint has been flushed
+ * out into the network. Typical actions performed in this
+ * callback include:
+ * (a) send the saved device model state (for HVM guests),
+ * (b) wait for checkpoint ack
+ * (c) release the network output buffer pertaining to the acked
checkpoint.
+ * (c) sleep for the checkpoint interval.
+ *
+ * returns:
* 0: terminate checkpointing gracefully
* 1: take another checkpoint */
int (*checkpoint)(void* data);
diff -r c94dbffb74b2 -r 2cde33aba59b tools/libxl/libxl.c
--- a/tools/libxl/libxl.c Thu Feb 09 16:54:37 2012 -0800
+++ b/tools/libxl/libxl.c Thu Feb 09 17:13:15 2012 -0800
@@ -540,6 +540,41 @@ libxl_vminfo * libxl_list_vm(libxl_ctx *
return ptr;
}
+/* TODO: Explicit Checkpoint acknowledgements via recv_fd. */
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+ uint32_t domid, int send_fd, int recv_fd)
+{
+ GC_INIT(ctx);
+ libxl_domain_type type = libxl__domain_type(gc, domid);
+ int rc = 0;
+
+ if (info == NULL) {
+ LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
+ "No remus_info structure supplied for domain %d",
domid);
+ rc = ERROR_INVAL;
+ goto remus_fail;
+ }
+
+ /* TBD: Remus setup - i.e. attach qdisc, enable disk buffering, etc */
+
+ /* Point of no return */
+ rc = libxl__domain_suspend_common(gc, domid, send_fd, type, /* live */ 1,
+ /* debug */ 0, info);
+
+ /*
+ * With Remus, if we reach this point, it means either
+ * backup died or some network error occurred preventing us
+ * from sending checkpoints.
+ */
+
+ /* TBD: Remus cleanup - i.e. detach qdisc, release other
+ * resources.
+ */
+ remus_fail:
+ GC_FREE;
+ return rc;
+}
+
int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
uint32_t domid, int fd)
{
@@ -549,7 +584,9 @@ int libxl_domain_suspend(libxl_ctx *ctx,
int debug = info != NULL && info->flags & XL_SUSPEND_DEBUG;
int rc = 0;
- rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug);
+ rc = libxl__domain_suspend_common(gc, domid, fd, type, live, debug,
+ /* No Remus */ NULL);
+
if (!rc && type == LIBXL_DOMAIN_TYPE_HVM)
rc = libxl__domain_save_device_model(gc, domid, fd);
GC_FREE;
diff -r c94dbffb74b2 -r 2cde33aba59b tools/libxl/libxl.h
--- a/tools/libxl/libxl.h Thu Feb 09 16:54:37 2012 -0800
+++ b/tools/libxl/libxl.h Thu Feb 09 17:13:15 2012 -0800
@@ -323,6 +323,8 @@ typedef int (*libxl_console_ready)(libxl
int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config,
libxl_console_ready cb, void *priv, uint32_t *domid);
int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config,
libxl_console_ready cb, void *priv, uint32_t *domid, int restore_fd);
void libxl_domain_config_dispose(libxl_domain_config *d_config);
+int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
+ uint32_t domid, int send_fd, int recv_fd);
int libxl_domain_suspend(libxl_ctx *ctx, libxl_domain_suspend_info *info,
uint32_t domid, int fd);
diff -r c94dbffb74b2 -r 2cde33aba59b tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c Thu Feb 09 16:54:37 2012 -0800
+++ b/tools/libxl/libxl_dom.c Thu Feb 09 17:13:15 2012 -0800
@@ -473,6 +473,8 @@ struct suspendinfo {
int hvm;
unsigned int flags;
int guest_responded;
+ int save_fd; /* Migration stream fd (for Remus) */
+ int interval; /* checkpoint interval (for Remus) */
};
static int libxl__domain_suspend_common_switch_qemu_logdirty(int domid,
unsigned int enable, void *data)
@@ -737,9 +739,43 @@ static int libxl__toolstack_save(uint32_
return 0;
}
+static int libxl__remus_domain_suspend_callback(void *data)
+{
+ /* TODO: Issue disk and network checkpoint reqs. */
+ return libxl__domain_suspend_common_callback(data);
+}
+
+static int libxl__remus_domain_resume_callback(void *data)
+{
+ struct suspendinfo *si = data;
+ libxl_ctx *ctx = libxl__gc_owner(si->gc);
+
+ /* Resumes the domain and the device model */
+ if (libxl_domain_resume(ctx, si->domid, /* Fast Suspend */1))
+ return 0;
+
+ /* TODO: Deal with disk. Start a new network output buffer */
+ return 1;
+}
+
+static int libxl__remus_domain_checkpoint_callback(void *data)
+{
+ struct suspendinfo *si = data;
+
+ /* This would go into tailbuf. */
+ if (si->hvm &&
+ libxl__domain_save_device_model(si->gc, si->domid,
si->save_fd))
+ return 0;
+
+ /* TODO: Wait for disk and memory ack, release network buffer */
+ usleep(si->interval * 1000);
+ return 1;
+}
+
int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
libxl_domain_type type,
- int live, int debug)
+ int live, int debug,
+ const libxl_domain_remus_info *r_info)
{
libxl_ctx *ctx = libxl__gc_owner(gc);
int flags;
@@ -770,10 +806,20 @@ int libxl__domain_suspend_common(libxl__
return ERROR_INVAL;
}
+ memset(&si, 0, sizeof(si));
flags = (live) ? XCFLAGS_LIVE : 0
| (debug) ? XCFLAGS_DEBUG : 0
| (hvm) ? XCFLAGS_HVM : 0;
+ if (r_info != NULL) {
+ si.interval = r_info->interval;
+ if (r_info->compression)
+ flags |= XCFLAGS_CHECKPOINT_COMPRESS;
+ si.save_fd = fd;
+ }
+ else
+ si.save_fd = -1;
+
si.domid = domid;
si.flags = flags;
si.hvm = hvm;
@@ -797,7 +843,13 @@ int libxl__domain_suspend_common(libxl__
}
memset(&callbacks, 0, sizeof(callbacks));
- callbacks.suspend = libxl__domain_suspend_common_callback;
+ if (r_info != NULL) {
+ callbacks.suspend = libxl__remus_domain_suspend_callback;
+ callbacks.postcopy = libxl__remus_domain_resume_callback;
+ callbacks.checkpoint = libxl__remus_domain_checkpoint_callback;
+ } else
+ callbacks.suspend = libxl__domain_suspend_common_callback;
+
callbacks.switch_qemu_logdirty =
libxl__domain_suspend_common_switch_qemu_logdirty;
callbacks.toolstack_save = libxl__toolstack_save;
callbacks.data = &si;
diff -r c94dbffb74b2 -r 2cde33aba59b tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h Thu Feb 09 16:54:37 2012 -0800
+++ b/tools/libxl/libxl_internal.h Thu Feb 09 17:13:15 2012 -0800
@@ -629,7 +629,8 @@ _hidden int libxl__domain_restore_common
int fd);
_hidden int libxl__domain_suspend_common(libxl__gc *gc, uint32_t domid, int fd,
libxl_domain_type type,
- int live, int debug);
+ int live, int debug,
+ const libxl_domain_remus_info
*r_info);
_hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t
domid);
_hidden int libxl__domain_suspend_device_model(libxl__gc *gc, uint32_t domid);
_hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid);
diff -r c94dbffb74b2 -r 2cde33aba59b tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl Thu Feb 09 16:54:37 2012 -0800
+++ b/tools/libxl/libxl_types.idl Thu Feb 09 17:13:15 2012 -0800
@@ -406,6 +406,12 @@ libxl_sched_sedf = Struct("sched_sedf",
("weight", integer),
], dispose_fn=None)
+libxl_domain_remus_info = Struct("domain_remus_info",[
+ ("interval", integer),
+ ("blackhole", bool),
+ ("compression", bool),
+ ])
+
libxl_event_type = Enumeration("event_type", [
(1, "DOMAIN_SHUTDOWN"),
(2, "DOMAIN_DEATH"),
rshriram@cs.ubc.ca
2012-Feb-10 01:58 UTC
[PATCH 2 of 2 V4] libxl: Remus - xl remus command
# HG changeset patch
# User Shriram Rajagopalan <rshriram@cs.ubc.ca>
# Date 1328836781 28800
# Node ID 7cbe8d029c59d5ff44bafe8065fef07b6cd0126b
# Parent 2cde33aba59badb3f2faa2d77fe5b18441c7f221
libxl: Remus - xl remus command
xl remus acts as a frontend to enable remus for a given domain.
* At the moment, only memory checkpointing and blackhole replication is
supported. Support for disk checkpointing and network buffering will
be added in future.
* Replication is done over ssh connection currently (like live migration
with xl). Future versions will have an option to use simple tcp socket
based replication channel (for both Remus & live migration).
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
diff -r 2cde33aba59b -r 7cbe8d029c59 docs/man/xl.pod.1
--- a/docs/man/xl.pod.1 Thu Feb 09 17:13:15 2012 -0800
+++ b/docs/man/xl.pod.1 Thu Feb 09 17:19:41 2012 -0800
@@ -350,6 +350,41 @@ Send <config> instead of config file fro
=back
+=item B<remus> [I<OPTIONS>] I<domain-id> I<host>
+
+Enable Remus HA for domain. By default B<xl> relies on ssh as a transport
mechanism
+between the two hosts.
+
+B<OPTIONS>
+
+=over 4
+
+=item B<-i> I<MS>
+
+Checkpoint domain memory every MS milliseconds (default 200ms).
+
+=item B<-b>
+
+Do not checkpoint the disk. Replicate memory checkpoints to /dev/null
(blackhole).
+Network output buffering remains enabled (unless --no-net is supplied).
+Generally useful for debugging.
+
+=item B<-u>
+
+Disable memory checkpoint compression.
+
+=item B<-s> I<sshcommand>
+
+Use <sshcommand> instead of ssh. String will be passed to sh. If empty,
run
+<host> instead of ssh <host> xl migrate-receive -r [-e].
+
+=item B<-e>
+
+On the new host, do not wait in the background (on <host>) for the death
of the
+domain. See the corresponding option of the I<create> subcommand.
+
+=back
+
=item B<pause> I<domain-id>
Pause a domain. When in a paused state the domain will still consume
diff -r 2cde33aba59b -r 7cbe8d029c59 tools/libxl/xl.h
--- a/tools/libxl/xl.h Thu Feb 09 17:13:15 2012 -0800
+++ b/tools/libxl/xl.h Thu Feb 09 17:19:41 2012 -0800
@@ -94,6 +94,7 @@ int main_cpupoolnumasplit(int argc, char
int main_getenforce(int argc, char **argv);
int main_setenforce(int argc, char **argv);
int main_loadpolicy(int argc, char **argv);
+int main_remus(int argc, char **argv);
void help(const char *command);
diff -r 2cde33aba59b -r 7cbe8d029c59 tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c Thu Feb 09 17:13:15 2012 -0800
+++ b/tools/libxl/xl_cmdimpl.c Thu Feb 09 17:19:41 2012 -0800
@@ -2879,7 +2879,7 @@ static void core_dump_domain(const char
}
static void migrate_receive(int debug, int daemonize, int monitor,
- int send_fd, int recv_fd)
+ int send_fd, int recv_fd, int remus)
{
int rc, rc2;
char rc_buf;
@@ -2914,6 +2914,41 @@ static void migrate_receive(int debug, i
exit(-rc);
}
+ if (remus) {
+ /* If we are here, it means that the sender (primary) has crashed.
+ * TODO: Split-Brain Check.
+ */
+ fprintf(stderr, "migration target: Remus Failover for domain
%u\n",
+ domid);
+
+ /*
+ * If domain renaming fails, lets just continue (as we need the domain
+ * to be up & dom names may not matter much, as long as its
reachable
+ * over network).
+ *
+ * If domain unpausing fails, destroy domain ? Or is it better to have
+ * a consistent copy of the domain (memory, cpu state, disk)
+ * on atleast one physical host ? Right now, lets just leave the domain
+ * as is and let the Administrator decide (or troubleshoot).
+ */
+ if (migration_domname) {
+ rc = libxl_domain_rename(ctx, domid, migration_domname,
+ common_domname);
+ if (rc)
+ fprintf(stderr, "migration target (Remus): "
+ "Failed to rename domain from %s to %s:%d\n",
+ migration_domname, common_domname, rc);
+ }
+
+ rc = libxl_domain_unpause(ctx, domid);
+ if (rc)
+ fprintf(stderr, "migration target (Remus): "
+ "Failed to unpause domain %s (id: %u):%d\n",
+ common_domname, domid, rc);
+
+ exit(rc ? -ERROR_FAIL: 0);
+ }
+
fprintf(stderr, "migration target: Transfer complete,"
" requesting permission to start domain.\n");
@@ -3040,10 +3075,10 @@ int main_restore(int argc, char **argv)
int main_migrate_receive(int argc, char **argv)
{
- int debug = 0, daemonize = 1, monitor = 1;
+ int debug = 0, daemonize = 1, monitor = 1, remus = 0;
int opt;
- while ((opt = def_getopt(argc, argv, "Fed",
"migrate-receive", 0)) != -1) {
+ while ((opt = def_getopt(argc, argv, "Fedr",
"migrate-receive", 0)) != -1) {
switch (opt) {
case 0: case 2:
return opt;
@@ -3057,6 +3092,9 @@ int main_migrate_receive(int argc, char
case ''d'':
debug = 1;
break;
+ case ''r'':
+ remus = 1;
+ break;
}
}
@@ -3065,7 +3103,8 @@ int main_migrate_receive(int argc, char
return 2;
}
migrate_receive(debug, daemonize, monitor,
- STDOUT_FILENO, STDIN_FILENO);
+ STDOUT_FILENO, STDIN_FILENO,
+ remus);
return 0;
}
@@ -5999,6 +6038,102 @@ done:
return ret;
}
+int main_remus(int argc, char **argv)
+{
+ int opt, rc, daemonize = 1;
+ const char *ssh_command = "ssh";
+ char *host = NULL, *rune = NULL, *domain = NULL;
+ libxl_domain_remus_info r_info;
+ int send_fd = -1, recv_fd = -1;
+ pid_t child = -1;
+ uint8_t *config_data;
+ int config_len;
+
+ memset(&r_info, 0, sizeof(libxl_domain_remus_info));
+ /* Defaults */
+ r_info.interval = 200;
+ r_info.blackhole = 0;
+ r_info.compression = 1;
+
+ while ((opt = def_getopt(argc, argv, "bui:s:e",
"remus", 2)) != -1) {
+ switch (opt) {
+ case 0: case 2:
+ return opt;
+
+ case ''i'':
+ r_info.interval = atoi(optarg);
+ break;
+ case ''b'':
+ r_info.blackhole = 1;
+ break;
+ case ''u'':
+ r_info.compression = 0;
+ break;
+ case ''s'':
+ ssh_command = optarg;
+ break;
+ case ''e'':
+ daemonize = 0;
+ break;
+ }
+ }
+
+ domain = argv[optind];
+ host = argv[optind + 1];
+
+ if (r_info.blackhole) {
+ find_domain(domain);
+ send_fd = open("/dev/null", O_RDWR, 0644);
+ if (send_fd < 0) {
+ perror("failed to open /dev/null");
+ exit(-1);
+ }
+ } else {
+
+ if (!ssh_command[0]) {
+ rune = host;
+ } else {
+ if (asprintf(&rune, "exec %s %s xl migrate-receive -r
%s",
+ ssh_command, host,
+ daemonize ? "" : " -e") < 0)
+ return 1;
+ }
+
+ save_domain_core_begin(domain, NULL, &config_data,
&config_len);
+
+ if (!config_len) {
+ fprintf(stderr, "No config file stored for running domain and
"
+ "none supplied - cannot start remus.\n");
+ exit(1);
+ }
+
+ child = create_migration_child(rune, &send_fd, &recv_fd);
+
+ migrate_do_preamble(send_fd, recv_fd, child, config_data, config_len,
+ rune);
+ }
+
+ /* Point of no return */
+ rc = libxl_domain_remus_start(ctx, &r_info, domid, send_fd, recv_fd);
+
+ /* If we are here, it means backup has failed/domain suspend failed.
+ * Try to resume the domain and exit gracefully.
+ * TODO: Split-Brain check.
+ */
+ fprintf(stderr, "remus sender: libxl_domain_suspend failed"
+ " (rc=%d)\n", rc);
+
+ if (rc == ERROR_GUEST_TIMEDOUT)
+ fprintf(stderr, "Failed to suspend domain at primary.\n");
+ else {
+ fprintf(stderr, "Remus: Backup failed? resuming domain at
primary.\n");
+ libxl_domain_resume(ctx, domid, 1);
+ }
+
+ close(send_fd);
+ return -ERROR_FAIL;
+}
+
/*
* Local variables:
* mode: C
diff -r 2cde33aba59b -r 7cbe8d029c59 tools/libxl/xl_cmdtable.c
--- a/tools/libxl/xl_cmdtable.c Thu Feb 09 17:13:15 2012 -0800
+++ b/tools/libxl/xl_cmdtable.c Thu Feb 09 17:19:41 2012 -0800
@@ -412,6 +412,20 @@ struct cmd_spec cmd_table[] = {
"Loads a new policy int the Flask Xen security module",
"<policy file>",
},
+ { "remus",
+ &main_remus, 0,
+ "Enable Remus HA for domain",
+ "[options] <Domain> [<host>]",
+ "-i MS Checkpoint domain memory every MS
milliseconds (def. 200ms).\n"
+ "-b Replicate memory checkpoints to /dev/null
(blackhole)\n"
+ "-u Disable memory checkpoint
compression.\n"
+ "-s <sshcommand> Use <sshcommand> instead of
ssh. String will be passed\n"
+ " to sh. If empty, run <host> instead
of \n"
+ " ssh <host> xl migrate-receive -r
[-e]\n"
+ "-e Do not wait in the background (on
<host>) for the death\n"
+ " of the domain."
+
+ },
};
int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
rshriram@cs.ubc.ca writes ("[Xen-devel] [PATCH 0 of 2 V4] libxl - Remus
support"):> This patch series introduces a basic framework to incorporate
> Remus into the libxl toolstack. The only functionality currently
> implemented is memory checkpointing.
Thanks. I''m afraid this doesn''t apply to current tip:
patching file tools/libxl/xl_cmdimpl.c
Hunk #1 FAILED at 2878
Hunk #5 FAILED at 3102
2 out of 6 hunks FAILED -- saving rejects to file tools/libxl/xl_cmdimpl.c.rej
While you''re refreshing it, could you rewrap the lines in xl.pod.1 to
be within 75-80 columns ?
Thanks,
Ian.
On Mon, Feb 20, 2012 at 11:04 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> rshriram@cs.ubc.ca writes ("[Xen-devel] [PATCH 0 of 2 V4] libxl - Remus > support"): > > This patch series introduces a basic framework to incorporate > > Remus into the libxl toolstack. The only functionality currently > > implemented is memory checkpointing. > > Thanks. I''m afraid this doesn''t apply to current tip: > >And it wont :), as these patches are not independent. As I mentioned in the introductory email, ''These patches depend on 1. "V3 libxl: refactor suspend/resume code" patch series (patches 1,2,3,5,6) (note: patches 1 & 2 have already been committed and others have been acked by Ian Campbell) 2. V4 of "libxl: support suspend_cancel in domain_resume" (message id: 47366457a52076b78c52.1328837508@athos.nss.cs.ubc.ca) 3. Stefano''s "V4 libxl: save/restore qemu physmap" '' patching file tools/libxl/xl_cmdimpl.c> Hunk #1 FAILED at 2878 > Hunk #5 FAILED at 3102 > 2 out of 6 hunks FAILED -- saving rejects to file > tools/libxl/xl_cmdimpl.c.rej > > While you''re refreshing it, could you rewrap the lines in xl.pod.1 to > be within 75-80 columns ? > >Certainly, if it comes to resending the patches. thanks shriram Thanks,> Ian. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xensource.com/xen-devel