john.levon@sun.com
2007-May-25 03:42 UTC
[Xen-devel] [PATCH] Do not destroy domains that timeout when shutting down
# HG changeset patch # User john.levon@sun.com # Date 1180046767 25200 # Node ID bfd2245773b4772671a5aa7772ff88554292de34 # Parent c7dee061ade1ac0c4ef417ecadd98f048ddece96 Do not destroy domains that timeout when shutting down Instead of violently destroying a domain that is not responding to a shutdown request, rename the domain to indicate the problem and leave it alone; this allows the admin to make corrective actions (which may or may not include destroying the domain). Signed-off-by: John Levon <john.levon@sun.com> diff --git a/tools/python/xen/xend/XendDomainInfo.py b/tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py +++ b/tools/python/xen/xend/XendDomainInfo.py @@ -351,6 +351,7 @@ class XendDomainInfo: self.vmWatch = None self.shutdownWatch = None self.shutdownStartTime = None + self.unresponsive = False self._resume = resume self.state = DOM_STATE_HALTED @@ -998,21 +999,25 @@ class XendDomainInfo: # failed. Ignore this domain. pass else: - # Domain is alive. If we are shutting it down, then check - # the timeout on that, and destroy it if necessary. + # Domain is alive. if xeninfo[''paused'']: self._stateSet(DOM_STATE_PAUSED) else: self._stateSet(DOM_STATE_RUNNING) - if self.shutdownStartTime: + if self.shutdownStartTime and not self.unresponsive: timeout = (SHUTDOWN_TIMEOUT - time.time() + self.shutdownStartTime) if timeout < 0: + # The domain is not responding to shutdown requests. + # Log a message, and rename the domain to indicate the + # state; we keep the domain running, however, to + # allow corrective action. log.info( "Domain shutdown timeout expired: name=%s id=%s", self.info[''name_label''], self.domid) - self.destroy() + self.setName(''unresponsive-'' + self.getName()) + self.unresponsive = True finally: self.refresh_shutdown_lock.release() @@ -1299,6 +1304,7 @@ class XendDomainInfo: log.debug(''XendDomainInfo.constructDomain'') self.shutdownStartTime = None + self.unresponsive = False image_cfg = self.info.get(''image'', {}) hvm = image_cfg.has_key(''hvm'') _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2007-May-25 12:18 UTC
Re: [Xen-devel] [PATCH] Do not destroy domains that timeout when shutting down
On Thu, May 24, 2007 at 08:42:06PM -0700, john.levon@sun.com wrote:> # HG changeset patch > # User john.levon@sun.com > # Date 1180046767 25200 > # Node ID bfd2245773b4772671a5aa7772ff88554292de34 > # Parent c7dee061ade1ac0c4ef417ecadd98f048ddece96 > Do not destroy domains that timeout when shutting downThis patch doesn''t work properly for managed domains. Working on a new one. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-25 12:37 UTC
Re: [Xen-devel] [PATCH] Do not destroy domains that timeout when shutting down
On 25/5/07 13:18, "John Levon" <levon@movementarian.org> wrote:>> # HG changeset patch >> # User john.levon@sun.com >> # Date 1180046767 25200 >> # Node ID bfd2245773b4772671a5aa7772ff88554292de34 >> # Parent c7dee061ade1ac0c4ef417ecadd98f048ddece96 >> Do not destroy domains that timeout when shutting down > > This patch doesn''t work properly for managed domains. Working on a new > one.I''ll revert the old one then. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel P. Berrange
2007-May-25 13:09 UTC
Re: [Xen-devel] [PATCH] Do not destroy domains that timeout when shutting down
On Thu, May 24, 2007 at 08:42:06PM -0700, john.levon@sun.com wrote:> # HG changeset patch > # User john.levon@sun.com > # Date 1180046767 25200 > # Node ID bfd2245773b4772671a5aa7772ff88554292de34 > # Parent c7dee061ade1ac0c4ef417ecadd98f048ddece96 > Do not destroy domains that timeout when shutting down > > Instead of violently destroying a domain that is not responding to a shutdown > request, rename the domain to indicate the problem and leave it alone; this > allows the admin to make corrective actions (which may or may not include > destroying the domain).Do we really need to rename it ? Various bits of code do uniqueness checks on both name & uuid. If we rename it, though uuid checks still work, any such checks on the name will not be correct. IMHO, if shutdown fails we should just leave the guest alone completely. The admin knows that they requested a shutdown, and can clearly see that after ''n'' seconds it is still around & can take the appropriate action - admins already know to watch for failed shutdowns since this stuff happens on bare metal too. Renaming to ''report'' error just seems wrong to me & will potentially confuse tools. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2007-May-25 14:18 UTC
Re: [Xen-devel] [PATCH] Do not destroy domains that timeout when shutting down
On Fri, May 25, 2007 at 02:09:23PM +0100, Daniel P. Berrange wrote:> > Instead of violently destroying a domain that is not responding to a shutdown > > request, rename the domain to indicate the problem and leave it alone; this > > allows the admin to make corrective actions (which may or may not include > > destroying the domain). > > Do we really need to rename it ? Various bits of code do uniqueness checks > on both name & uuid. If we rename it, though uuid checks still work, any such > checks on the name will not be correct. IMHO, if shutdown fails we should just > leave the guest alone completely. The admin knows that they requested a shutdown, > and can clearly see that after ''n'' seconds it is still around & can take the > appropriate action - admins already know to watch for failed shutdowns since > this stuff happens on bare metal too. Renaming to ''report'' error just seems > wrong to me & will potentially confuse tools.Seems like a reasonable argument to me... Keir? regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-25 14:20 UTC
Re: [Xen-devel] [PATCH] Do not destroy domains that timeout when shutting down
On 25/5/07 15:18, "John Levon" <levon@movementarian.org> wrote:>> Do we really need to rename it ? Various bits of code do uniqueness checks >> on both name & uuid. If we rename it, though uuid checks still work, any such >> checks on the name will not be correct. IMHO, if shutdown fails we should >> just >> leave the guest alone completely. The admin knows that they requested a >> shutdown, >> and can clearly see that after ''n'' seconds it is still around & can take the >> appropriate action - admins already know to watch for failed shutdowns since >> this stuff happens on bare metal too. Renaming to ''report'' error just seems >> wrong to me & will potentially confuse tools. > > Seems like a reasonable argument to me... Keir?I''m not too fussed either way really. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel