Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domain
Here''s some code I''ve been playing with lately that teaches Xen to do checkpoints of running domains. It adds a new flag (-c) to xm save that causes the domain to be restored to the runnable state after save instead of being destroyed. It also alters the checkpoint code in the guest to simply lock external devices instead of disconnecting them, and to only attempt to reconnect if the suspend hypercall returns in a new domain (detected by the hypercall return value). This alternate suspend path is triggered by a new shutdown code (''checkpoint'') - if the -c flag is not specified the existing ''suspend'' function is run, so this code shouldn''t have any effect on existing functionality. I''m not too sure about the last couple of patches in this series. Because the checkpointing domain doesn''t disconnect before calling suspend, it retains a few references to pages it doesn''t own. These trigger a PT race detector in xc_linux_save, which causes it to abort. So the last couple of patches explicitly identify the references I''ve found so far (shared_info and some grant table shared pages) and simply zero those PTEs during save, since they''ll be recreated on restore. Finding the grant table pages is a bit fragile - I walk the page table loaded in CR3 at the time of suspend looking for the virtual address I''ve stowed in the suspend record. I''ve only got code for two-level page tables at the moment, since I''m not convinced this is the right approach. Under what circumstances would a non-live save have an unsafe PTE race? Maybe it''s fine to simply zero these ptes without checking them. Or maybe it''d be less fragile to get the owners of the pages from Xen and see if the guest has legitimate mappings to them? Comments? I''ll post some truly horrible proof-of-concept code to create LVM snapshots at checkpoint time in a separate email. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 01 of 10] Add resumedomain domctl to resume a domain after checkpoint
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID df9ac19cdb9a6b3021010c8873911bb17f7bdc7a # Parent 4d2ae322ef0294df2e3361179b48cb4c339a555f Add resumedomain domctl to resume a domain after checkpoint. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r 4d2ae322ef02 -r df9ac19cdb9a xen/common/domctl.c --- a/xen/common/domctl.c Thu Dec 14 17:25:38 2006 +0000 +++ b/xen/common/domctl.c Thu Dec 14 23:05:42 2006 -0800 @@ -240,6 +240,31 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc } break; + case XEN_DOMCTL_resumedomain: + { + struct domain *d = find_domain_by_id(op->domain); + struct vcpu *v; + + ret = -ESRCH; + if ( d != NULL ) + { + ret = -EINVAL; + printk("Resuming domain %d\n", op->domain); + if ( (d != current->domain) && (d->vcpu[0] != NULL) && + test_bit(_DOMF_shutdown, &d->domain_flags) ) + { + clear_bit(_DOMF_shutdown, &d->domain_flags); + + for_each_vcpu (d, v) + vcpu_wake (v); + + ret = 0; + } + put_domain(d); + } + } + break; + case XEN_DOMCTL_createdomain: { struct domain *d; diff -r 4d2ae322ef02 -r df9ac19cdb9a xen/include/public/domctl.h --- a/xen/include/public/domctl.h Thu Dec 14 17:25:38 2006 +0000 +++ b/xen/include/public/domctl.h Thu Dec 14 23:05:42 2006 -0800 @@ -61,6 +61,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_creat #define XEN_DOMCTL_destroydomain 2 #define XEN_DOMCTL_pausedomain 3 #define XEN_DOMCTL_unpausedomain 4 +#define XEN_DOMCTL_resumedomain 26 #define XEN_DOMCTL_getdomaininfo 5 struct xen_domctl_getdomaininfo { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 02 of 10] Export resumedomain domctl to libxc
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID 08aa64728a7485274c5765968c77c07771ebbbf1 # Parent df9ac19cdb9a6b3021010c8873911bb17f7bdc7a Export resumedomain domctl to libxc. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r df9ac19cdb9a -r 08aa64728a74 tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/libxc/xc_domain.c Thu Dec 14 23:05:42 2006 -0800 @@ -86,6 +86,16 @@ int xc_domain_shutdown(int xc_handle, out1: return ret; +} + + +int xc_domain_resume(int xc_handle, + uint32_t domid) +{ + DECLARE_DOMCTL; + domctl.cmd = XEN_DOMCTL_resumedomain; + domctl.domain = (domid_t)domid; + return do_domctl(xc_handle, &domctl); } diff -r df9ac19cdb9a -r 08aa64728a74 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/libxc/xenctrl.h Thu Dec 14 23:05:42 2006 -0800 @@ -236,6 +236,18 @@ int xc_domain_destroy(int xc_handle, int xc_domain_destroy(int xc_handle, uint32_t domid); + +/** + * This function resumes a suspended domain. The domain should have + * been previously suspended. + * + * @parm xc_handle a handle to an open hypervisor interface + * @parm domid the domain id to resume + * return 0 on success, -1 on failure + */ +int xc_domain_resume(int xc_handle, + uint32_t domid); + /** * This function will shutdown a domain. This is intended for use in * fully-virtualized domains where this operation is analogous to the _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 03 of 10] Export xc_domain_resume to xend
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID 76574bc1ca50ece678c606887558e9f910361ac5 # Parent 08aa64728a7485274c5765968c77c07771ebbbf1 Export xc_domain_resume to xend. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r 08aa64728a74 -r 76574bc1ca50 tools/python/xen/lowlevel/xc/xc.c --- a/tools/python/xen/lowlevel/xc/xc.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/lowlevel/xc/xc.c Thu Dec 14 23:05:42 2006 -0800 @@ -160,6 +160,10 @@ static PyObject *pyxc_domain_destroy(XcO return dom_op(self, args, xc_domain_destroy); } +static PyObject *pyxc_domain_resume(XcObject *self, PyObject *args) +{ + return dom_op(self, args, xc_domain_resume); +} static PyObject *pyxc_vcpu_setaffinity(XcObject *self, PyObject *args, @@ -1031,6 +1035,13 @@ static PyMethodDef pyxc_methods[] = { METH_VARARGS, "\n" "Destroy a domain.\n" " dom [int]: Identifier of domain to be destroyed.\n\n" + "Returns: [int] 0 on success; -1 on error.\n" }, + + { "domain_resume", + (PyCFunction)pyxc_domain_resume, + METH_VARARGS, "\n" + "Resume execution of a suspended domain.\n" + " dom [int]: Identifier of domain to be resumed.\n\n" "Returns: [int] 0 on success; -1 on error.\n" }, { "vcpu_setaffinity", diff -r 08aa64728a74 -r 76574bc1ca50 tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/XendDomainInfo.py Thu Dec 14 23:05:42 2006 -0800 @@ -1525,6 +1525,15 @@ class XendDomainInfo: self.cleanupDomain() + + def resumeDomain(self): + log.debug("XendDomainInfo.resumeDomain(%s)", str(self.domid)) + + try: + if self.domid is not None: + xc.domain_resume(self.domid) + except: + log.exception("XendDomainInfo.resume: xc.domain_resume failed on domain %s." % (str(self.domid))) # # Channels for xenstore and console _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID 9c35e3a499a7a3eb95eaab616ded1e77d4676722 # Parent 76574bc1ca50ece678c606887558e9f910361ac5 Add XS_RESUME command. This clears the shutdown flag for a domain in xenstore, allowing subsequent shutdowns of the same domain to fire the appropriate watches. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r 76574bc1ca50 -r 9c35e3a499a7 tools/xenstore/xenstored_core.c --- a/tools/xenstore/xenstored_core.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/xenstore/xenstored_core.c Thu Dec 14 23:05:42 2006 -0800 @@ -164,6 +164,7 @@ static char *sockmsg_string(enum xsd_soc case XS_WATCH_EVENT: return "WATCH_EVENT"; case XS_ERROR: return "ERROR"; case XS_IS_DOMAIN_INTRODUCED: return "XS_IS_DOMAIN_INTRODUCED"; + case XS_RESUME: return "RESUME"; default: return "**UNKNOWN**"; } @@ -1265,6 +1266,10 @@ static void process_message(struct conne case XS_GET_DOMAIN_PATH: do_get_domain_path(conn, onearg(in)); + break; + + case XS_RESUME: + do_resume(conn, onearg(in)); break; default: diff -r 76574bc1ca50 -r 9c35e3a499a7 tools/xenstore/xenstored_domain.c --- a/tools/xenstore/xenstored_domain.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/xenstore/xenstored_domain.c Thu Dec 14 23:05:42 2006 -0800 @@ -395,6 +395,43 @@ void do_release(struct connection *conn, send_ack(conn, XS_RELEASE); } +void do_resume(struct connection *conn, const char *domid_str) +{ + struct domain *domain; + unsigned int domid; + + if (!domid_str) { + send_error(conn, EINVAL); + return; + } + + domid = atoi(domid_str); + if (!domid) { + send_error(conn, EINVAL); + return; + } + + if (conn->id != 0) { + send_error(conn, EACCES); + return; + } + + domain = find_domain_by_domid(domid); + if (!domain) { + send_error(conn, ENOENT); + return; + } + + if (!domain->conn) { + send_error(conn, EINVAL); + return; + } + + domain->shutdown = 0; + + send_ack(conn, XS_RESUME); +} + void do_get_domain_path(struct connection *conn, const char *domid_str) { char *path; diff -r 76574bc1ca50 -r 9c35e3a499a7 tools/xenstore/xenstored_domain.h --- a/tools/xenstore/xenstored_domain.h Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/xenstore/xenstored_domain.h Thu Dec 14 23:05:42 2006 -0800 @@ -32,6 +32,9 @@ void do_release(struct connection *conn, void do_release(struct connection *conn, const char *domid_str); /* domid */ +void do_resume(struct connection *conn, const char *domid_str); + +/* domid */ void do_get_domain_path(struct connection *conn, const char *domid_str); /* Returns the event channel handle */ diff -r 76574bc1ca50 -r 9c35e3a499a7 tools/xenstore/xs.c --- a/tools/xenstore/xs.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/xenstore/xs.c Thu Dec 14 23:05:42 2006 -0800 @@ -719,6 +719,12 @@ bool xs_release_domain(struct xs_handle return xs_bool(single_with_domid(h, XS_RELEASE, domid)); } +/* clear the shutdown bit for the given domain */ +bool xs_resume_domain(struct xs_handle *h, unsigned int domid) +{ + return xs_bool(single_with_domid(h, XS_RESUME, domid)); +} + char *xs_get_domain_path(struct xs_handle *h, unsigned int domid) { char domid_str[MAX_STRLEN(domid)]; diff -r 76574bc1ca50 -r 9c35e3a499a7 tools/xenstore/xs.h --- a/tools/xenstore/xs.h Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/xenstore/xs.h Thu Dec 14 23:05:42 2006 -0800 @@ -133,6 +133,11 @@ bool xs_introduce_domain(struct xs_handl unsigned int domid, unsigned long mfn, unsigned int eventchn); +/* Resume a domain. + * Clear the shutdown flag for this domain in the store. + */ +bool xs_resume_domain(struct xs_handle *h, unsigned int domid); + /* Release a domain. * Tells the store domain to release the memory page to the domain. */ diff -r 76574bc1ca50 -r 9c35e3a499a7 xen/include/public/io/xs_wire.h --- a/xen/include/public/io/xs_wire.h Thu Dec 14 23:05:42 2006 -0800 +++ b/xen/include/public/io/xs_wire.h Thu Dec 14 23:05:42 2006 -0800 @@ -45,7 +45,8 @@ enum xsd_sockmsg_type XS_SET_PERMS, XS_WATCH_EVENT, XS_ERROR, - XS_IS_DOMAIN_INTRODUCED + XS_IS_DOMAIN_INTRODUCED, + XS_RESUME }; #define XS_WRITE_NONE "NONE" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID 84b42490685d4cc9cf6aeea43bb4d90c31a20bc1 # Parent 9c35e3a499a7a3eb95eaab616ded1e77d4676722 Export XS_RESUME to xend. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r 9c35e3a499a7 -r 84b42490685d tools/python/xen/lowlevel/xs/xs.c --- a/tools/python/xen/lowlevel/xs/xs.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/lowlevel/xs/xs.c Thu Dec 14 23:05:42 2006 -0800 @@ -618,6 +618,33 @@ static PyObject *xspy_introduce_domain(X return none(result); } +#define xspy_resume_domain_doc "\n" \ + "Tell xenstore to clear its shutdown flag for a domain.\n" \ + "This ensures that a subsequent shutdown will fire the\n" \ + "appropriate watches.\n" \ + " dom [int]: domain id\n" \ + "\n" \ + "Returns None on success.\n" \ + "Raises xen.lowlevel.xs.Error on error.\n" + +static PyObject *xspy_resume_domain(XsHandle *self, PyObject *args) +{ + uint32_t dom; + + struct xs_handle *xh = xshandle(self); + bool result = 0; + + if (!xh) + return NULL; + if (!PyArg_ParseTuple(args, "i", &dom)) + return NULL; + + Py_BEGIN_ALLOW_THREADS + result = xs_resume_domain(xh, dom); + Py_END_ALLOW_THREADS + + return none(result); +} #define xspy_release_domain_doc "\n" \ "Tell xenstore to release its channel to a domain.\n" \ @@ -789,6 +816,7 @@ static PyMethodDef xshandle_methods[] = XSPY_METH(transaction_start, METH_NOARGS), XSPY_METH(transaction_end, METH_VARARGS | METH_KEYWORDS), XSPY_METH(introduce_domain, METH_VARARGS), + XSPY_METH(resume_domain, METH_VARARGS), XSPY_METH(release_domain, METH_VARARGS), XSPY_METH(close, METH_NOARGS), XSPY_METH(get_domain_path, METH_VARARGS), diff -r 9c35e3a499a7 -r 84b42490685d tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/XendDomainInfo.py Thu Dec 14 23:05:42 2006 -0800 @@ -45,7 +45,7 @@ from xen.xend.XendError import XendError from xen.xend.XendError import XendError, VmError from xen.xend.XendDevices import XendDevices from xen.xend.xenstore.xstransact import xstransact, complete -from xen.xend.xenstore.xsutil import GetDomainPath, IntroduceDomain +from xen.xend.xenstore.xsutil import GetDomainPath, IntroduceDomain, ResumeDomain from xen.xend.xenstore.xswatch import xswatch from xen.xend.XendConstants import * from xen.xend.XendAPIConstants import * @@ -1532,6 +1532,7 @@ class XendDomainInfo: try: if self.domid is not None: xc.domain_resume(self.domid) + ResumeDomain(self.domid) except: log.exception("XendDomainInfo.resume: xc.domain_resume failed on domain %s." % (str(self.domid))) diff -r 9c35e3a499a7 -r 84b42490685d tools/python/xen/xend/xenstore/xsutil.py --- a/tools/python/xen/xend/xenstore/xsutil.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/xenstore/xsutil.py Thu Dec 14 23:05:42 2006 -0800 @@ -24,3 +24,6 @@ def IntroduceDomain(domid, page, port): def GetDomainPath(domid): return xshandle().get_domain_path(domid) + +def ResumeDomain(domid): + return xshandle().resume_domain(domid) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 06 of 10] Make suspend hypercall return 1 when the domain has been resumed
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID dc4d3d58b1d24199101c782a2890b03bfb82fe28 # Parent 84b42490685d4cc9cf6aeea43bb4d90c31a20bc1 Make suspend hypercall return 1 when the domain has been resumed. This patch writes 1 into EAX when the domain has been resumed, alerting the guest domain that it needs to reconnect to its back ends. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r 84b42490685d -r dc4d3d58b1d2 tools/libxc/xc_linux_restore.c --- a/tools/libxc/xc_linux_restore.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/libxc/xc_linux_restore.c Thu Dec 14 23:05:42 2006 -0800 @@ -690,6 +690,8 @@ int xc_linux_restore(int xc_handle, int ERROR("Suspend record frame number is bad"); goto out; } + /* HYPERVISOR_suspend returns 1 to let guest know it should reconnect */ + ctxt.user_regs.eax = 1; ctxt.user_regs.edx = mfn = p2m[pfn]; start_info = xc_map_foreign_range( xc_handle, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, mfn); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID d39e577379a3375d7340ef265d472f957694d8a0 # Parent dc4d3d58b1d24199101c782a2890b03bfb82fe28 Add new shutdown mode for checkpoint. When control/shutdown = checkpoint, invoke an alternate suspend path that doesn''t disconnect from back ends, and only reconnects when the image has been restored into a new domain. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r dc4d3d58b1d2 -r d39e577379a3 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c --- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42 2006 -0800 @@ -85,6 +85,20 @@ static void pre_suspend(void) mfn_to_pfn(xen_start_info->console.domU.mfn); } +static void pre_checkpoint(void) +{ + xen_start_info->store_mfn = mfn_to_pfn(xen_start_info->store_mfn); + xen_start_info->console.domU.mfn + mfn_to_pfn(xen_start_info->console.domU.mfn); +} + +static void post_checkpoint(void) +{ + xen_start_info->store_mfn = pfn_to_mfn(xen_start_info->store_mfn); + xen_start_info->console.domU.mfn + pfn_to_mfn(xen_start_info->console.domU.mfn); +} + static void post_suspend(void) { int i, j, k, fpp; @@ -183,3 +197,70 @@ int __xen_suspend(void) return err; } + +int __xen_checkpoint(void) +{ + int err; + + extern void time_resume(void); + + BUG_ON(smp_processor_id() != 0); + BUG_ON(in_interrupt()); + +#if defined(__i386__) || defined(__x86_64__) + if (xen_feature(XENFEAT_auto_translated_physmap)) { + printk(KERN_WARNING "Cannot suspend in " + "auto_translated_physmap mode.\n"); + return -EOPNOTSUPP; + } +#endif + + err = smp_suspend(); + if (err) + return err; + + xenbus_lock(); + + preempt_disable(); + + mm_pin_all(); + local_irq_disable(); + preempt_enable(); + + pre_checkpoint(); + + /* + * We''ll stop somewhere inside this hypercall. When it returns, + * we''ll start resuming after the restore. + */ + err = HYPERVISOR_suspend(virt_to_mfn(xen_start_info)); + + if (err) { + /* We are resuming in a new domain -- reconnect */ + post_suspend(); + + gnttab_resume(); + + irq_resume(); + + time_resume(); + + switch_idle_mm(); + + local_irq_enable(); + + xencons_resume(); + + xenbus_resume(); + } else { + post_checkpoint(); + + local_irq_enable(); + + xenbus_unlock(); + } + + smp_resume(); + + return err; +} diff -r dc4d3d58b1d2 -r d39e577379a3 linux-2.6-xen-sparse/drivers/xen/core/reboot.c --- a/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/core/reboot.c Thu Dec 14 23:05:42 2006 -0800 @@ -11,15 +11,16 @@ MODULE_LICENSE("Dual BSD/GPL"); -#define SHUTDOWN_INVALID -1 -#define SHUTDOWN_POWEROFF 0 -#define SHUTDOWN_SUSPEND 2 +#define SHUTDOWN_INVALID -1 +#define SHUTDOWN_POWEROFF 0 +#define SHUTDOWN_SUSPEND 2 /* Code 3 is SHUTDOWN_CRASH, which we don''t use because the domain can only * report a crash, not be instructed to crash! * HALT is the same as POWEROFF, as far as we''re concerned. The tools use * the distinction when we return the reason code to them. */ -#define SHUTDOWN_HALT 4 +#define SHUTDOWN_HALT 4 +#define SHUTDOWN_CHECKPOINT 5 /* Ignore multiple shutdown requests. */ static int shutting_down = SHUTDOWN_INVALID; @@ -29,8 +30,10 @@ static DECLARE_WORK(shutdown_work, __shu #ifdef CONFIG_XEN int __xen_suspend(void); +int __xen_checkpoint(void); #else #define __xen_suspend() (void)0 +#define __xen_checkpoint() (void)0 #endif static int shutdown_process(void *__unused) @@ -61,7 +64,10 @@ static int shutdown_process(void *__unus static int xen_suspend(void *__unused) { - __xen_suspend(); + if (shutting_down == SHUTDOWN_CHECKPOINT) + __xen_checkpoint(); + else + __xen_suspend(); shutting_down = SHUTDOWN_INVALID; return 0; } @@ -84,7 +90,8 @@ static void __shutdown_handler(void *unu { int err; - if (shutting_down != SHUTDOWN_SUSPEND) + if (shutting_down != SHUTDOWN_SUSPEND + && shutting_down != SHUTDOWN_CHECKPOINT) err = kernel_thread(shutdown_process, NULL, CLONE_FS | CLONE_FILES); else @@ -132,6 +139,8 @@ static void shutdown_handler(struct xenb kill_proc(1, SIGINT, 1); /* interrupt init */ else if (strcmp(str, "suspend") == 0) shutting_down = SHUTDOWN_SUSPEND; + else if (strcmp(str, "checkpoint") == 0) + shutting_down = SHUTDOWN_CHECKPOINT; else if (strcmp(str, "halt") == 0) shutting_down = SHUTDOWN_HALT; else { diff -r dc4d3d58b1d2 -r d39e577379a3 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c --- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Thu Dec 14 23:05:42 2006 -0800 @@ -727,6 +727,18 @@ void xenbus_suspend(void) } EXPORT_SYMBOL_GPL(xenbus_suspend); +void xenbus_lock(void) +{ + xs_lock(); +} +EXPORT_SYMBOL_GPL(xenbus_lock); + +void xenbus_unlock(void) +{ + xs_unlock(); +} +EXPORT_SYMBOL_GPL(xenbus_unlock); + void xenbus_resume(void) { xb_init_comms(); diff -r dc4d3d58b1d2 -r d39e577379a3 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_xs.c --- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_xs.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_xs.c Thu Dec 14 23:05:42 2006 -0800 @@ -666,6 +666,18 @@ void unregister_xenbus_watch(struct xenb } EXPORT_SYMBOL_GPL(unregister_xenbus_watch); +void xs_lock(void) +{ + down_write(&xs_state.suspend_mutex); + mutex_lock(&xs_state.request_mutex); +} + +void xs_unlock(void) +{ + mutex_unlock(&xs_state.request_mutex); + up_write(&xs_state.suspend_mutex); +} + void xs_suspend(void) { struct xenbus_watch *watch; diff -r dc4d3d58b1d2 -r d39e577379a3 linux-2.6-xen-sparse/include/xen/xenbus.h --- a/linux-2.6-xen-sparse/include/xen/xenbus.h Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/include/xen/xenbus.h Thu Dec 14 23:05:42 2006 -0800 @@ -158,6 +158,8 @@ void unregister_xenstore_notifier(struct int register_xenbus_watch(struct xenbus_watch *watch); void unregister_xenbus_watch(struct xenbus_watch *watch); +void xs_lock(void); +void xs_unlock(void); void xs_suspend(void); void xs_resume(void); @@ -167,6 +169,8 @@ void *xenbus_dev_request_and_reply(struc /* Called from xen core code. */ void xenbus_suspend(void); void xenbus_resume(void); +void xenbus_lock(void); +void xenbus_unlock(void); #define XENBUS_IS_ERR_READ(str) ({ \ if (!IS_ERR(str) && strlen(str) == 0) { \ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 08 of 10] Add xm save -c/--checkpoint option
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID a5274ebef731512d9681c7b81667b509f2e5346a # Parent d39e577379a3375d7340ef265d472f957694d8a0 Add xm save -c/--checkpoint option xm save --checkpoint leaves the domain running after creating the snapshot. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r d39e577379a3 -r a5274ebef731 tools/python/xen/xend/XendCheckpoint.py --- a/tools/python/xen/xend/XendCheckpoint.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/XendCheckpoint.py Thu Dec 14 23:05:42 2006 -0800 @@ -51,7 +51,7 @@ def read_exact(fd, size, errmsg): return buf -def save(fd, dominfo, network, live, dst): +def save(fd, dominfo, network, live, dst, checkpoint=False): write_exact(fd, SIGNATURE, "could not write guest state file: signature") config = sxp.to_string(dominfo.sxpr()) @@ -83,7 +83,10 @@ def save(fd, dominfo, network, live, dst log.debug("In saveInputHandler %s", line) if line == "suspend": log.debug("Suspending %d ...", dominfo.getDomid()) - dominfo.shutdown(''suspend'') + if checkpoint: + dominfo.shutdown(''checkpoint'') + else: + dominfo.shutdown(''suspend'') dominfo.waitForShutdown() dominfo.migrateDevices(network, dst, DEV_MIGRATE_STEP2, domain_name) @@ -96,7 +99,8 @@ def save(fd, dominfo, network, live, dst forkHelper(cmd, fd, saveInputHandler, False) - dominfo.destroyDomain() + if not checkpoint: + dominfo.destroyDomain() try: dominfo.setName(domain_name) except VmError: @@ -105,6 +109,8 @@ def save(fd, dominfo, network, live, dst # persistent VM, we need the rename, and don''t expect the # conflict. This needs more thought. pass + if checkpoint: + dominfo.resumeDomain() except Exception, exn: log.exception("Save failed on domain %s (%s).", domain_name, diff -r d39e577379a3 -r a5274ebef731 tools/python/xen/xend/XendConstants.py --- a/tools/python/xen/xend/XendConstants.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/XendConstants.py Thu Dec 14 23:05:42 2006 -0800 @@ -21,18 +21,20 @@ from xen.xend.XendAPIConstants import * # Shutdown codes and reasons. # -DOMAIN_POWEROFF = 0 -DOMAIN_REBOOT = 1 -DOMAIN_SUSPEND = 2 -DOMAIN_CRASH = 3 -DOMAIN_HALT = 4 +DOMAIN_POWEROFF = 0 +DOMAIN_REBOOT = 1 +DOMAIN_SUSPEND = 2 +DOMAIN_CRASH = 3 +DOMAIN_HALT = 4 +DOMAIN_CHECKPOINT = 5 DOMAIN_SHUTDOWN_REASONS = { - DOMAIN_POWEROFF: "poweroff", - DOMAIN_REBOOT : "reboot", - DOMAIN_SUSPEND : "suspend", - DOMAIN_CRASH : "crash", - DOMAIN_HALT : "halt" + DOMAIN_POWEROFF : "poweroff", + DOMAIN_REBOOT : "reboot", + DOMAIN_SUSPEND : "suspend", + DOMAIN_CRASH : "crash", + DOMAIN_HALT : "halt", + DOMAIN_CHECKPOINT: "checkpoint" } REVERSE_DOMAIN_SHUTDOWN_REASONS = \ dict([(y, x) for x, y in DOMAIN_SHUTDOWN_REASONS.items()]) diff -r d39e577379a3 -r a5274ebef731 tools/python/xen/xend/XendDomain.py --- a/tools/python/xen/xend/XendDomain.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/XendDomain.py Thu Dec 14 23:05:42 2006 -0800 @@ -1134,7 +1134,7 @@ class XendDomain: dominfo.testDeviceComplete() sock.close() - def domain_save(self, domid, dst): + def domain_save(self, domid, dst, checkpoint): """Start saving a domain to file. @param domid: Domain ID or Name @@ -1155,8 +1155,8 @@ class XendDomain: fd = os.open(dst, os.O_WRONLY | os.O_CREAT | os.O_TRUNC) try: - # For now we don''t support ''live checkpoint'' - XendCheckpoint.save(fd, dominfo, False, False, dst) + XendCheckpoint.save(fd, dominfo, False, False, dst, + checkpoint=checkpoint) finally: os.close(fd) except OSError, ex: diff -r d39e577379a3 -r a5274ebef731 tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xend/XendDomainInfo.py Thu Dec 14 23:05:42 2006 -0800 @@ -828,7 +828,7 @@ class XendDomainInfo: reason = self.readDom(''control/shutdown'') - if reason and reason != ''suspend'': + if reason and reason not in (''suspend'', ''checkpoint''): sst = self.readDom(''xend/shutdown_start_time'') now = time.time() if sst: @@ -994,7 +994,7 @@ class XendDomainInfo: self._clearRestart() - if reason == ''suspend'': + if reason in (''suspend'', ''checkpoint''): self._stateSet(DOM_STATE_SUSPENDED) # Don''t destroy the domain. XendCheckpoint will do # this once it has finished. However, stop watching diff -r d39e577379a3 -r a5274ebef731 tools/python/xen/xm/main.py --- a/tools/python/xen/xm/main.py Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/python/xen/xm/main.py Thu Dec 14 23:05:42 2006 -0800 @@ -97,7 +97,7 @@ SUBCOMMAND_HELP = { ''reboot'' : (''<Domain> [-wa]'', ''Reboot a domain.''), ''restore'' : (''<CheckpointFile> [-p]'', ''Restore a domain from a saved state.''), - ''save'' : (''<Domain> <CheckpointFile>'', + ''save'' : (''[-c] <Domain> <CheckpointFile>'', ''Save a domain state to restore later.''), ''shutdown'' : (''<Domain> [-waRH]'', ''Shutdown a domain.''), ''top'' : ('''', ''Monitor a host and the domains in real time.''), @@ -224,6 +224,9 @@ SUBCOMMAND_OPTIONS = { ''resume'': ( (''-p'', ''--paused'', ''Do not unpause domain after resuming it''), ), + ''save'': ( + (''-c'', ''--checkpoint'', ''Leave domain running after creating snapshot''), + ), ''restore'': ( (''-p'', ''--paused'', ''Do not unpause domain after restoring it''), ), @@ -531,21 +534,37 @@ def get_single_vm(dom): ######################################################################### def xm_save(args): - arg_check(args, "save", 2) - - try: - dominfo = parse_doms_info(server.xend.domain(args[0])) + arg_check(args, "save", 2, 3) + + try: + (options, params) = getopt.gnu_getopt(args, ''c'', [''checkpoint'']) + except getopt.GetoptError, opterr: + err(opterr) + sys.exit(1) + + checkpoint = False + for (k, v) in options: + if k in [''-c'', ''--checkpoint'']: + checkpoint = True + + if len(params) != 2: + err("Wrong number of parameters") + usage(''save'') + sys.exit(1) + + try: + dominfo = parse_doms_info(server.xend.domain(params[0])) except xmlrpclib.Fault, ex: raise ex domid = dominfo[''domid''] - savefile = os.path.abspath(args[1]) + savefile = os.path.abspath(params[1]) if not os.access(os.path.dirname(savefile), os.W_OK): err("xm save: Unable to create file %s" % savefile) sys.exit(1) - server.xend.domain.save(domid, savefile) + server.xend.domain.save(domid, savefile, checkpoint) def xm_restore(args): arg_check(args, "restore", 1, 2) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 09 of 10] Advertise address of grant table shared pages in suspend record
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166166342 28800 # Node ID 9182ff9b291d7fef7e05c6899922e88c54d0e419 # Parent a5274ebef731512d9681c7b81667b509f2e5346a Advertise address of grant table shared pages in suspend record. A checkpointed guest keeps its mappings to the shared_info page and grant table shared pages. To let xc_linux_save distinguish between these legitimate mappings and page table races, export the addresses in the suspend record. This patch puts the grant table shared page and lengths into the start_info pt_base and nr_pt_frames fields, which are otherwise unused after boot. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r a5274ebef731 -r 9182ff9b291d linux-2.6-xen-sparse/drivers/xen/core/gnttab.c --- a/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c Thu Dec 14 23:05:42 2006 -0800 @@ -426,6 +426,12 @@ int gnttab_suspend(void) return 0; } +int gnttab_checkpoint(void) +{ + xen_start_info->pt_base = (unsigned long)shared; + xen_start_info->nr_pt_frames = NR_GRANT_FRAMES; +} + #else /* !CONFIG_XEN */ #include <platform-pci.h> diff -r a5274ebef731 -r 9182ff9b291d linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c --- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42 2006 -0800 @@ -229,6 +229,8 @@ int __xen_checkpoint(void) pre_checkpoint(); + gnttab_checkpoint(); + /* * We''ll stop somewhere inside this hypercall. When it returns, * we''ll start resuming after the restore. diff -r a5274ebef731 -r 9182ff9b291d linux-2.6-xen-sparse/include/xen/gnttab.h --- a/linux-2.6-xen-sparse/include/xen/gnttab.h Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/include/xen/gnttab.h Thu Dec 14 23:05:42 2006 -0800 @@ -116,6 +116,7 @@ void gnttab_grant_foreign_transfer_ref(g #endif int gnttab_suspend(void); +int gnttab_checkpoint(void); int gnttab_resume(void); static inline void _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-15 06:38 UTC
[Xen-devel] [PATCH 10 of 10] Ignore safe foreign maps in xc_linux_save
# HG changeset patch # User Brendan Cully <brendan@cs.ubc.ca> # Date 1166167313 28800 # Node ID 660b54dd9d6d7a8a33c583f6cabd4177c866d7b0 # Parent 9182ff9b291d7fef7e05c6899922e88c54d0e419 Ignore safe foreign maps in xc_linux_save. When called via the checkpoint path, the guest retains references to the start_info and grant table shared pages owned by Xen. Detect and zero these in the save path instead of aborting due to an apparent page table race. Grant table pages are found by walking the guest page table mapped at suspend time, and only done for two-level page tables in this patch. Signed-off-by: Brendan Cully <brendan@cs.ubc.ca> diff -r 9182ff9b291d -r 660b54dd9d6d tools/libxc/xc_linux_save.c --- a/tools/libxc/xc_linux_save.c Thu Dec 14 23:05:42 2006 -0800 +++ b/tools/libxc/xc_linux_save.c Thu Dec 14 23:21:53 2006 -0800 @@ -44,6 +44,9 @@ static xen_pfn_t *live_p2m = NULL; /* Live mapping of system MFN to PFN table. */ static xen_pfn_t *live_m2p = NULL; + +/* References to xen pages held by guest - should not count as races */ +static unsigned long foreign_maps[6]; /* grep fodder: machine_to_phys */ @@ -417,7 +420,8 @@ static int canonicalize_pagetable(unsign const void *spage, void *dpage) { - int i, pte_last, xen_start, xen_end, race = 0; + int i, pte_last, xen_start, xen_end, race = 0; + unsigned long* foreign_map; uint64_t pte; /* @@ -475,12 +479,22 @@ static int canonicalize_pagetable(unsign mfn = (pte >> PAGE_SHIFT) & 0xfffffff; if (!MFN_IS_IN_PSEUDOPHYS_MAP(mfn)) { - /* This will happen if the type info is stale which - is quite feasible under live migration */ - DPRINTF("PT Race: [%08lx,%d] pte=%llx, mfn=%08lx\n", - type, i, (unsigned long long)pte, mfn); + /* zap foreign mappings which will be recreated on resume */ + for (foreign_map = foreign_maps; *foreign_map; foreign_map++) { + if (mfn == *foreign_map) { + DPRINTF("Skipping legitimate mapping %08lx\n", mfn); + pte = 0; + break; + } + } + if (pte) { + /* This will happen if the type info is stale which + is quite feasible under live migration */ + DPRINTF("PT Race: [%08lx,%d] pte=%llx, mfn=%08lx\n", + type, i, (unsigned long long)pte, mfn); + race = 1; /* inform the caller of race; fatal if !live */ + } pfn = 0; /* zap it - we''ll retransmit this page later */ - race = 1; /* inform the caller of race; fatal if !live */ } else pfn = mfn_to_pfn(mfn); @@ -556,7 +570,99 @@ static xen_pfn_t *xc_map_m2p(int xc_hand return m2p; } - +static int virt_to_mfn(int xc_handle, int dom, unsigned long pt_mfn, + unsigned long va, unsigned long* mfn) +{ + unsigned long* pde, *l2; + unsigned long pte, l2_mfn; + int rc = -1; + + /* TODO: support for other than two-level page tables */ + if (pt_levels != 2) + return -1; + + if (!MFN_IS_IN_PSEUDOPHYS_MAP(pt_mfn)) { + DPRINTF("Bad CR3 MFN %08lx\n", pt_mfn); + return -1; + } + + pde = xc_map_foreign_range(xc_handle, dom, PAGE_SIZE, PROT_READ, pt_mfn); + pte = pde[l2_table_offset(va)]; + if (pte & _PAGE_PRESENT) { + l2_mfn = (pte >> PAGE_SHIFT) & 0xfffffff; + if (!MFN_IS_IN_PSEUDOPHYS_MAP(l2_mfn)) { + DPRINTF("Bad L2 MFN %08lx\n", pt_mfn); + munmap(pde, PAGE_SIZE); + return -1; + } + l2 = xc_map_foreign_range(xc_handle, dom, PAGE_SIZE, PROT_READ, l2_mfn); + pte = l2[l1_table_offset(va)]; + if (pte & _PAGE_PRESENT) { + *mfn = (pte >> PAGE_SHIFT) & 0xfffffff; + rc = 0; + DPRINTF("VA %08lx maps to MFN %08lx\n", va, *mfn); + } else { + DPRINTF("MFN for %08lx not present\n", va); + } + munmap(l2, PAGE_SIZE); + } else { + DPRINTF("Page table for %08lx not present\n", va); + } + + munmap(pde, PAGE_SIZE); + + return rc; +} + +/* record legitimate foreign mappings that shouldn''t cause races + * when found in canonicalize_pagetables */ +static void find_foreign_maps(int xc_handle, int dom, unsigned long sif, + vcpu_guest_context_t* ctxt) +{ + unsigned long* page; + unsigned long suspend_mfn, gt_va; + int gt_len; + int i; + + memset(foreign_maps, 0, sizeof(foreign_maps)); + /* shared info page */ + foreign_maps[0] = sif; + + suspend_mfn = ctxt->user_regs.edx; + if (!MFN_IS_IN_PSEUDOPHYS_MAP(suspend_mfn)) { + DPRINTF("Bad suspend record MFN %08lx\n", suspend_mfn); + return; + } + + /* track down grant table shared pages by walking the current guest + * page table to find the mfns of their virtual addresses */ + /* this is a little silly */ + page = xc_map_foreign_range(xc_handle, dom, PAGE_SIZE, PROT_READ, + suspend_mfn); + if (!page) { + DPRINTF("Failed to map suspend record\n"); + return; + } + + gt_va = ((start_info_t*)page)->pt_base; + gt_len = ((start_info_t*)page)->nr_pt_frames; + munmap(page, PAGE_SIZE); + + if (gt_len > 4) { + /* probably not the grant table info (plain save) */ + DPRINTF("No grant table info found in suspend record\n"); + return; + } + + DPRINTF("Grant table shared page base: %08lx, len: %d\n", + gt_va, gt_len); + + for (i = 0; i < gt_len; i++) { + virt_to_mfn(xc_handle, dom, xen_cr3_to_pfn(ctxt->ctrlreg[3]), + gt_va + (i << PAGE_SHIFT), + foreign_maps + i + 1); + } +} int xc_linux_save(int xc_handle, int io_fd, uint32_t dom, uint32_t max_iters, uint32_t max_factor, uint32_t flags, int (*suspend)(int)) @@ -607,7 +713,6 @@ int xc_linux_save(int xc_handle, int io_ unsigned long needed_to_fix = 0; unsigned long total_sent = 0; - /* If no explicit control parameters given, use defaults */ if(!max_iters) @@ -813,6 +918,8 @@ int xc_linux_save(int xc_handle, int io_ DPRINTF("Had %d unexplained entries in p2m table\n", err); } + /* record legitimate foreign mappings */ + find_foreign_maps(xc_handle, dom, shared_info_frame, &ctxt); /* Start writing out the saved-domain record. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Hand
2006-Dec-15 08:07 UTC
Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domain
>I''m not too sure about the last couple of patches in this >series. Because the checkpointing domain doesn''t disconnect before >calling suspend, it retains a few references to pages it doesn''t >own. These trigger a PT race detector in xc_linux_save, which causes >it to abort. So the last couple of patches explicitly identify the >references I''ve found so far (shared_info and some grant table shared >pages) and simply zero those PTEs during save, since they''ll be >recreated on restore. Finding the grant table pages is a bit fragile - >I walk the page table loaded in CR3 at the time of suspend looking for >the virtual address I''ve stowed in the suspend record. I''ve only got >code for two-level page tables at the moment, since I''m not convinced >this is the right approach. Under what circumstances would a non-live >save have an unsafe PTE race?Pretty much any PT race in a non-live save/migrate is a bug; the domain is (in theory) suspended at this point, and all of the devices are disconnected. Since you''ve chosen not to ''disconnect'' the devices, you''ll get random updates occuring to any shared pages (shared via grants or directly shared with Xen).> Maybe it''s fine to simply zero these ptes without checking them.I''d think not.>Or maybe it''d be less fragile to get the owners of the pages from Xen >and see if the guest has legitimate mappings to them? Comments?I think the ideal thing to do here is to mirror the live migrate case, i.e. do a full ''disconnect'' of devices, xenbus, console, event channels etc, and then bring them back up. It''ll probably be possible to do this in a slightly more efficient / less intrusive fashion by just cauterising things in Xen (i.e. closing the event channel -> guest path but not unbinding the interdomain side). For grants, you basically have to follow the live migrate case and be prepared to re-issue, since otherwise on resume (which is preumably desired at some point?) you''ll have garbage in flight and/or lost requests. Anyway, looks like an interesting start, and would be a nice feature to get into -unstable sometime post 3.0.4. cheers, S. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2006-Dec-16 00:04 UTC
Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domain
I think maybe I forgot to mention that I have successfully checkpointed domains and restored them from checkpoints (with file-system activity between checkpoints). It seems to work pretty well. I''ll try to put together a demo of this next week. Regarding full device disconnection, my understanding is that guest domains are already prepared to deal with back-end driver crashes (by maintaining shadows of the ring etc), so a forced reconnect on resume should be able to recover even if there wasn''t an orderly shutdown before the suspend. I thought when I looked over the code that the reconnect path did a paranoid forced disconnect first anyway (eg checking for existing event channels and resetting them). On the other hand, if checkpoints are taken more frequently than they are restored, it seems odd to be constantly detaching and reattaching back-ends in the parent. But if this is unsafe, it should be fairly easy to make the code do a full disconnect before suspend. It might be as easy as changing xm save to write ''suspend'' to control/shutdown instead of ''checkpoint''. On Friday, 15 December 2006 at 08:07, Steven Hand wrote:> > >I''m not too sure about the last couple of patches in this > >series. Because the checkpointing domain doesn''t disconnect before > >calling suspend, it retains a few references to pages it doesn''t > >own. These trigger a PT race detector in xc_linux_save, which causes > >it to abort. So the last couple of patches explicitly identify the > >references I''ve found so far (shared_info and some grant table shared > >pages) and simply zero those PTEs during save, since they''ll be > >recreated on restore. Finding the grant table pages is a bit fragile - > >I walk the page table loaded in CR3 at the time of suspend looking for > >the virtual address I''ve stowed in the suspend record. I''ve only got > >code for two-level page tables at the moment, since I''m not convinced > >this is the right approach. Under what circumstances would a non-live > >save have an unsafe PTE race? > > Pretty much any PT race in a non-live save/migrate is a bug; the > domain is (in theory) suspended at this point, and all of the > devices are disconnected. Since you''ve chosen not to ''disconnect'' > the devices, you''ll get random updates occuring to any shared > pages (shared via grants or directly shared with Xen). > > > Maybe it''s fine to simply zero these ptes without checking them. > > I''d think not.to clarify, the pages that have caused races in my experiments are always the same 5: shared_info and four grant table shared pages. The reason these don''t cause races in plain save is simply that they are unmapped before suspend is called. Since I''ve adjusted the kernel to recreate these specific pages on restore (but not in the parent when checkpoint returns), my patches do just zero out the PTEs (simulating in the save code what had previously been done in the guest). Finding the guest grant table pages is a little annoying though. I ended up having the guest put the virtual address of its mapping into an unused field in the suspend record, then walking the page table to find the MFN. I was thinking it might be better to either get Xen to export a list of pages that the guest has references to, or to assume that any unowned MFNs in the page tables are either pages that will be recreated on restore anyway and just zero them out. In short, I wonder how often that PT race code has stopped a non-live save. If the answer is ''never'', then zeroing out the PTEs might be fine. Especially since the original domain is still intact after the checkpoint. Thanks again for looking this over. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yoshiaki Tamura
2006-Dec-20 10:01 UTC
Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a
Brendan: Hi, my name is Yoshi Tamura, working for NTT Labs in Japan. I tried your patches, and I liked your new feature to checkpoint a running domain. I also tried your patches for live migration, but xc_linux_restore() on the remote machine failed. I track downed the problem and fixed it by modifying __xen_checkpoint() in machine_reboot.c. Take a look at the following patch. As far as I have tested, it works for both xm save -c and xm migrate –live. Let me know if you have any comments or better idea. Regards, Yoshi Tamura Signed-off-by: Yoshi Tamura <tamura.yoshiaki@lab.ntt.co.jp> diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c --- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Wed Dec 20 16:21:43 2006 +0900 @@ -171,8 +171,6 @@ int __xen_suspend(void) pre_suspend(); - gnttab_checkpoint(); - /* * We''ll stop somewhere inside this hypercall. When it returns, * we''ll start resuming after the restore. @@ -223,6 +221,8 @@ int __xen_checkpoint(void) xenbus_lock(); + gnttab_suspend(); + preempt_disable(); mm_pin_all(); @@ -257,6 +257,8 @@ int __xen_checkpoint(void) } else { post_checkpoint(); + gnttab_resume(); + local_irq_enable(); xenbus_unlock(); Brendan Cully wrote:> I think maybe I forgot to mention that I have successfully > checkpointed domains and restored them from checkpoints (with > file-system activity between checkpoints). It seems to work pretty > well. I''ll try to put together a demo of this next week. > > Regarding full device disconnection, my understanding is that guest > domains are already prepared to deal with back-end driver crashes (by > maintaining shadows of the ring etc), so a forced reconnect on resume > should be able to recover even if there wasn''t an orderly shutdown > before the suspend. I thought when I looked over the code that the > reconnect path did a paranoid forced disconnect first anyway (eg > checking for existing event channels and resetting them). > > On the other hand, if checkpoints are taken more frequently than they > are restored, it seems odd to be constantly detaching and reattaching > back-ends in the parent. > > But if this is unsafe, it should be fairly easy to make the code do a > full disconnect before suspend. It might be as easy as changing xm > save to write ''suspend'' to control/shutdown instead of ''checkpoint''. > > On Friday, 15 December 2006 at 08:07, Steven Hand wrote: >>> I''m not too sure about the last couple of patches in this >>> series. Because the checkpointing domain doesn''t disconnect before >>> calling suspend, it retains a few references to pages it doesn''t >>> own. These trigger a PT race detector in xc_linux_save, which causes >>> it to abort. So the last couple of patches explicitly identify the >>> references I''ve found so far (shared_info and some grant table shared >>> pages) and simply zero those PTEs during save, since they''ll be >>> recreated on restore. Finding the grant table pages is a bit fragile - >>> I walk the page table loaded in CR3 at the time of suspend looking for >>> the virtual address I''ve stowed in the suspend record. I''ve only got >>> code for two-level page tables at the moment, since I''m not convinced >>> this is the right approach. Under what circumstances would a non-live >>> save have an unsafe PTE race? >> Pretty much any PT race in a non-live save/migrate is a bug; the >> domain is (in theory) suspended at this point, and all of the >> devices are disconnected. Since you''ve chosen not to ''disconnect'' >> the devices, you''ll get random updates occuring to any shared >> pages (shared via grants or directly shared with Xen). >> >>> Maybe it''s fine to simply zero these ptes without checking them. >> I''d think not. > > to clarify, the pages that have caused races in my experiments are > always the same 5: shared_info and four grant table shared pages. The > reason these don''t cause races in plain save is simply that they are > unmapped before suspend is called. Since I''ve adjusted the kernel to > recreate these specific pages on restore (but not in the parent when > checkpoint returns), my patches do just zero out the PTEs (simulating > in the save code what had previously been done in the guest). > > Finding the guest grant table pages is a little annoying though. I > ended up having the guest put the virtual address of its mapping into > an unused field in the suspend record, then walking the page table to > find the MFN. I was thinking it might be better to either get Xen to > export a list of pages that the guest has references to, or to assume > that any unowned MFNs in the page tables are either pages that will be > recreated on restore anyway and just zero them out. In short, I wonder > how often that PT race code has stopped a non-live save. If the answer > is ''never'', then zeroing out the PTEs might be fine. Especially since > the original domain is still intact after the checkpoint. > > Thanks again for looking this over. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >-- TAMURA, Yoshiaki NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: tamura.yoshiaki@lab.ntt.co.jp TEL: (046)-859-2771 FAX: (046)-855-1152 Address: 1-1 Hikarinooka, Yokosuka Kanagawa 239-0847 JAPAN _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-28 16:51 UTC
Re: [Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint
On 15/12/06 6:38 am, "Brendan Cully" <brendan@cs.ubc.ca> wrote:> Add new shutdown mode for checkpoint. > > When control/shutdown = checkpoint, invoke an alternate suspend path > that doesn''t disconnect from back ends, and only reconnects when the > image has been restored into a new domain.I don''t think a new type of ''checkpoint'' handler is required in the guest OS. We are already most of the way there in terms of doing as little as possible on the suspend side of save/restore, so we should fix up what little else there is to be done. Looking at the differences versus your new checkpointing suspend: 1. Xenbus_suspend() needs to stay. Actually most drivers do not have a suspend handler anyway (only tpmfront does). We should provide a suspend_cancelled() hook callback so that drivers which *do* have a suspend handler can distinguish between proper resume and checkpoint return. 2. I don''t think we really need to xs_unwatch() all our watches on xs_suspend(). Probably that code can just go. 3. Keep gnttab_suspend(). It isn''t really that slow to execute and avoids needing other hacks. 4. Keep pre_suspend() and don''t have special pre_checkpoint(). Again, it is fairly cheap to clear/renew the shared_info mapping. 5. It would be nice to have a backward-compatible way for the guest to tell the tools that its suspension is cancellable. For this we could write an informative string into xen_start_info->magic[]. Notifying the guest of suspend-cancel versus restore can be done via %eax return code. For example, 0==suspend-cancel, +ve==restore, -ve==error. Old tools will leave %eax==__HYPERVISOR_sched_op, which will correctly map to ''restore''. This allows us to use this cheap checkpoint framework to also provide easy cancellation of save/restore if something goes wrong (e.g., network connectivity fails during live migration). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2007-Jan-09 21:33 UTC
Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a
On Wednesday, 20 December 2006 at 19:01, Yoshiaki Tamura wrote:> Brendan: > > Hi, my name is Yoshi Tamura, working for NTT Labs in Japan. > I tried your patches, and I liked your new feature to checkpoint a running > domain. > I also tried your patches for live migration, but xc_linux_restore() on the > remote machine failed. > I track downed the problem and fixed it by modifying __xen_checkpoint() in > machine_reboot.c. Take a look at the following patch. > As far as I have tested, it works for both xm save -c and xm migrate > –live. > Let me know if you have any comments or better idea.Hi Yoshi, sorry for the late reply - I went on vacation shortly after your post. I''m working on incorporating Keir''s feedback at the moment, which seems to include your suggestion. I''ll post a new patch series soon. By the way, how were you doing checkpointed live migration? Didn''t the old and new domains fight over the network and block devices?> Regards, > > Yoshi Tamura > > > Signed-off-by: Yoshi Tamura <tamura.yoshiaki@lab.ntt.co.jp> > > diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c > 23:05:42 2006 -0800 > +++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Wed Dec 20 > 16:21:43 2006 +0900 > @@ -171,8 +171,6 @@ int __xen_suspend(void) > > pre_suspend(); > > - gnttab_checkpoint(); > - > /* > * We''ll stop somewhere inside this hypercall. When it returns, > * we''ll start resuming after the restore. > @@ -223,6 +221,8 @@ int __xen_checkpoint(void) > > xenbus_lock(); > > + gnttab_suspend(); > + > preempt_disable(); > > mm_pin_all(); > @@ -257,6 +257,8 @@ int __xen_checkpoint(void) > } else { > post_checkpoint(); > > + gnttab_resume(); > + > local_irq_enable(); > > xenbus_unlock(); > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yoshiaki Tamura
2007-Jan-12 00:56 UTC
Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a
Hi Brendan,> sorry for the late reply - I went on vacation shortly after your > post.No problem at all. I was on vacation too.> I''m working on incorporating Keir''s feedback at the moment, > which seems to include your suggestion. I''ll post a new patch series > soon.That''s nice to hear. I''m looking forward to seeing the new patches.> By the way, how were you doing checkpointed live migration? Didn''t the > old and new domains fight over the network and block devices?Of course if you unpause the new domain, they will fight and destroy the shared storage. I just keep the new domain paused until the old domain has disappeared. If the domain''s workload is low, you can see the identical domains running concurrently for a moment just for fun. But you should be careful :-) Thanks, Yoshi> >> Regards, >> >> Yoshi Tamura >> >> >> Signed-off-by: Yoshi Tamura <tamura.yoshiaki@lab.ntt.co.jp> >> >> diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c >> 23:05:42 2006 -0800 >> +++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Wed Dec 20 >> 16:21:43 2006 +0900 >> @@ -171,8 +171,6 @@ int __xen_suspend(void) >> >> pre_suspend(); >> >> - gnttab_checkpoint(); >> - >> /* >> * We''ll stop somewhere inside this hypercall. When it returns, >> * we''ll start resuming after the restore. >> @@ -223,6 +221,8 @@ int __xen_checkpoint(void) >> >> xenbus_lock(); >> >> + gnttab_suspend(); >> + >> preempt_disable(); >> >> mm_pin_all(); >> @@ -257,6 +257,8 @@ int __xen_checkpoint(void) >> } else { >> post_checkpoint(); >> >> + gnttab_resume(); >> + >> local_irq_enable(); >> >> xenbus_unlock(); >> >> >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2007-Jan-12 01:25 UTC
Re: [Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint
Hi Keir, Thanks for looking over these patches. I''ve updated my patch set according to your comments, and I''ll send them a long in a separate thread. But I''ve replied to your comments below. On Thursday, 28 December 2006 at 16:51, Keir Fraser wrote:> On 15/12/06 6:38 am, "Brendan Cully" <brendan@cs.ubc.ca> wrote: > > > Add new shutdown mode for checkpoint. > > > > When control/shutdown = checkpoint, invoke an alternate suspend path > > that doesn''t disconnect from back ends, and only reconnects when the > > image has been restored into a new domain. > > I don''t think a new type of ''checkpoint'' handler is required in the guest > OS. We are already most of the way there in terms of doing as little as > possible on the suspend side of save/restore, so we should fix up what > little else there is to be done. Looking at the differences versus your new > checkpointing suspend: > 1. Xenbus_suspend() needs to stay. Actually most drivers do not have a > suspend handler anyway (only tpmfront does). We should provide a > suspend_cancelled() hook callback so that drivers which *do* have a suspend > handler can distinguish between proper resume and checkpoint return.Hmm. I didn''t see the tpmfront driver in my linux-2.6-xen-sparse/drivers/xen directory, and I''m not sure what the in-guest suspend code is supposed to do (or how it would roll back). I''ve left the suspend code active in the new patch set, but not yet added the cancellation hook, since there''s nothing to use it or test it now anyway. I''ve also disabled the device resume hook in the source resume path: that code tears down the existing backend connection and waits for a new one to be set up, which never happens in the source domain. How would you prefer that this work?> 2. I don''t think we really need to xs_unwatch() all our watches on > xs_suspend(). Probably that code can just go.ok, done.> 3. Keep gnttab_suspend(). It isn''t really that slow to execute and avoids > needing other hacks.ok, done.> 4. Keep pre_suspend() and don''t have special pre_checkpoint(). Again, it is > fairly cheap to clear/renew the shared_info mapping.ok, done.> 5. It would be nice to have a backward-compatible way for the guest to tell > the tools that its suspension is cancellable. For this we could write an > informative string into xen_start_info->magic[]. Notifying the guest ofas I understood it, xen_start_info (the suspend record) isn''t available to the save code until the guest passes it along as the argument to the suspend hypercall. Isn''t this too late to be useful? I suppose the guest could write some feature flag to xenstore though.> suspend-cancel versus restore can be done via %eax return code. For example, > 0==suspend-cancel, +ve==restore, -ve==error. Old tools will leave > %eax==__HYPERVISOR_sched_op, which will correctly map to ''restore''.I''m not sure I see the difference from the guest''s point of view between a cancelled suspend and an error. So I still only put 0 or 1 into eax.> This allows us to use this cheap checkpoint framework to also provide easy > cancellation of save/restore if something goes wrong (e.g., network > connectivity fails during live migration).Sure. As far as I can see, a cancellable save is just a checkpoint where the parent is destroyed on commit. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2007-Jan-12 23:58 UTC
Re: [Xen-devel] [PATCH 07 of 10] Add new shutdown mode for checkpoint
On Thursday, 28 December 2006 at 16:51, Keir Fraser wrote:> On 15/12/06 6:38 am, "Brendan Cully" <brendan@cs.ubc.ca> wrote: > > > Add new shutdown mode for checkpoint. > > > > When control/shutdown = checkpoint, invoke an alternate suspend path > > that doesn''t disconnect from back ends, and only reconnects when the > > image has been restored into a new domain. > > I don''t think a new type of ''checkpoint'' handler is required in the guest > OS. We are already most of the way there in terms of doing as little as > possible on the suspend side of save/restore, so we should fix up what > little else there is to be done. Looking at the differences versus your new > checkpointing suspend: > 1. Xenbus_suspend() needs to stay. Actually most drivers do not have a > suspend handler anyway (only tpmfront does). We should provide a > suspend_cancelled() hook callback so that drivers which *do* have a suspend > handler can distinguish between proper resume and checkpoint return.I guess this is what you had in mind? I don''t have anything that uses it at the moment though. It''s just suspend_dev cut''n''pasted. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel