Frank Pan
2011-Mar-03 10:13 UTC
[Xen-devel] [PATCH] Make explicit message when guest failed to suspend
Recent xen uses xenbus to suspend PV-on-HVM guest domain. The related code in the xen-unstable tree will fall into infinite loop when guest domain failed on suspending one or more devices. The patch attached changes the logic, and raises an XendError after 1 minute waiting. The patch also makes use of "control/shutdown" entry in xenstore, allows guest kernel report the failure of the suspending. Any suggestions? --- linux-2.6-xen/drivers/xen/manage.c | 8 ++++++++ tools/python/xen/xend/XendDomainInfo.py | 27 ++++++++++++++++++--------- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/linux-2.6-xen/drivers/xen/manage.c b/linux-2.6-xen/drivers/xen/manage.c index 0b50906..5b9c132 100644 --- a/linux-2.6-xen/drivers/xen/manage.c +++ b/linux-2.6-xen/drivers/xen/manage.c @@ -165,6 +165,8 @@ out_destroy_sm: stop_machine_destroy(); out: + if (cancelled) + xenbus_write(XBT_NIL, "control", "shutdown", "failed"); shutting_down = SHUTDOWN_INVALID; } #endif /* CONFIG_PM_SLEEP */ @@ -190,6 +192,11 @@ static void shutdown_handler(struct xenbus_watch *watch, xenbus_transaction_end(xbt, 1); return; } + /* Ignore failed message, posted by self */ + if (strcmp(str, "failed") == 0) { + xenbus_transaction_end(xbt, 1); + return; + } xenbus_write(xbt, "control", "shutdown", ""); @@ -212,6 +219,7 @@ static void shutdown_handler(struct xenbus_watch *watch, #endif } else { printk(KERN_INFO "Ignoring shutdown request: %s\n", str); + xenbus_write(XBT_NIL, "control", "shutdown", "failed"); shutting_down = SHUTDOWN_INVALID; } diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py index d5e92be..c652939 100644 --- a/tools/python/xen/xend/XendDomainInfo.py +++ b/tools/python/xen/xend/XendDomainInfo.py @@ -1882,7 +1882,7 @@ class XendDomainInfo: reason = self.readDom(''control/shutdown'') - if reason and reason != ''suspend'': + if reason and reason != ''suspend'' and reason != ''failed'': sst = self.readDom(''xend/shutdown_start_time'') now = time.time() if sst: @@ -2990,14 +2990,23 @@ class XendDomainInfo: try: while self._stateGet() in (DOM_STATE_RUNNING,DOM_STATE_PAUSED): self.state_updated.wait(1.0) - if state == "suspend": - if nr_tries == 0: - msg = (''Timeout waiting for domain %s to suspend'' - % self.domid) - self._writeDom(''control/shutdown'', '''') - raise XendError(msg) - state = self.readDom(''control/shutdown'') - nr_tries -= 1 + + state = self.readDom(''control/shutdown'') + + if state == ''failed'': + msg = (''Domain %s suspend failed. '' + ''Check kernel log of domain for detail.'' + % self.domid) + self._writeDom(''control/shutdown'', '''') + raise XendError(msg) + + if nr_tries == 0: + msg = (''Timeout waiting for domain %s to suspend'' + % self.domid) + self._writeDom(''control/shutdown'', '''') + raise XendError(msg) + nr_tries -= 1 + finally: self.state_updated.release() -- 1.7.1 -- Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-03 10:56 UTC
[Xen-devel] Re: [PATCH] Make explicit message when guest failed to suspend
On Thu, 2011-03-03 at 10:13 +0000, Frank Pan wrote:> Recent xen uses xenbus to suspend PV-on-HVM guest domain. The related > code in the xen-unstable tree will fall into infinite loop when guest > domain failed on suspending one or more devices.This is a bug in Xend, it should never enter an infinite loop regardless of guest behaviour. The code certainly appears to expect to be managing a timeout, perhaps it has bitrotted?> The patch attached changes the logic, and raises an XendError after 1 > minute waiting. The patch also makes use of "control/shutdown" entry > in xenstore, allows guest kernel report the failure of the suspending.I don''t think you can change the xenbus protocol in this way without further rationale regarding it''s correctness. In particular you need to consider and explain how it remains compatible with a new kernel running on older toolstacks and vice versa.> Any suggestions?Perhaps a separate control/shutdown-error key? The message written should be more verbose than just "failed" if/when more specific information is available to the kernel.> @@ -165,6 +165,8 @@ out_destroy_sm: > stop_machine_destroy(); > > out: > + if (cancelled) > + xenbus_write(XBT_NIL, "control", "shutdown", "failed");cancelled does not necessarily imply failure, it can mean the suspend resulted in a checkpoint rather than a full suspend. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-03 11:22 UTC
[Xen-devel] Re: [PATCH] Make explicit message when guest failed to suspend
> This is a bug in Xend, it should never enter an infinite loop regardless > of guest behaviour. The code certainly appears to expect to be managing > a timeout, perhaps it has bitrotted? > > I don''t think you can change the xenbus protocol in this way without > further rationale regarding it''s correctness. > > Perhaps a separate control/shutdown-error key? The message written > should be more verbose than just "failed" if/when more specific > information is available to the kernel.FP: Maybe I should seperate it into 2 patches? One for bug fixing, the other for new protocol?> In particular you need to consider and explain how it remains compatible > with a new kernel running on older toolstacks and vice versa. >FP: Right. This is something I haven''t considered yet. -- Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-03 11:41 UTC
[Xen-devel] Re: [PATCH] Make explicit message when guest failed to suspend
On Thu, 2011-03-03 at 11:22 +0000, Frank Pan wrote:> > This is a bug in Xend, it should never enter an infinite loop regardless > > of guest behaviour. The code certainly appears to expect to be managing > > a timeout, perhaps it has bitrotted? > > > > I don''t think you can change the xenbus protocol in this way without > > further rationale regarding it''s correctness. > > > > Perhaps a separate control/shutdown-error key? The message written > > should be more verbose than just "failed" if/when more specific > > information is available to the kernel. > > FP: Maybe I should seperate it into 2 patches? One for bug fixing, the > other for new protocol?The patch to xend should be separate in any case -- since the kernel and xend live in different trees. also if you are changing the protocol you need to implement it for libxl/xl as well. so you actually need (are least) two series: xen-unstable.hg 1/2 -- xend bugfix xen-unstable.hg 2/2 -- xend+libxl/xl side of protocol change kernel 1/1 -- protocol change etc.> > > In particular you need to consider and explain how it remains compatible > > with a new kernel running on older toolstacks and vice versa. > > > > FP: Right. This is something I haven''t considered yet. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-03 14:28 UTC
[Xen-devel] Re: [PATCH] Make explicit message when guest failed to suspend
> also if you are changing the protocol you need to implement it for > libxl/xl as well. so you actually need (are least) two series:FP: I''ve never used xl before. All commands I use start with xm. Is xl a substitute of xm? Also, is xm deprecated? -- Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-03 14:50 UTC
[Xen-devel] Re: [PATCH] Make explicit message when guest failed to suspend
On Thu, 2011-03-03 at 14:28 +0000, Frank Pan wrote:> > also if you are changing the protocol you need to implement it for > > libxl/xl as well. so you actually need (are least) two series: > > FP: I''ve never used xl before. All commands I use start with xm. > Is xl a substitute of xm?Yes, it is a standalone toolstack intended as a drop in replacement for the xm interface to xend. See http://wiki.xensource.com/xenwiki/MigrationGuideToXen4.1+> Also, is xm deprecated?xend remains supported in 4.1 but users are strongly recommended to migrate to xl. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Mar-03 17:19 UTC
[Xen-devel] Re: [PATCH] Make explicit message when guest failed to suspend
Frank Pan writes ("Re: [PATCH] Make explicit message when guest failed to suspend"):> I''ve never used xl before. All commands I use start with xm. > Is xl a substitute of xm? Also, is xm deprecated?xl is preferred nowadays. xm is still supported in Xen 4.1 (current xen-unstable) but will probably be removed some point in the 4.2 release cycle. xl is supposed to be a mostly drop-in replacement for xm; you may need to adjust your startup scripts to use it. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel