Shriram Rajagopalan
2011-Apr-07 20:05 UTC
[Xen-devel] [PATCH] remus: proper cleanup on checkpoint failure
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1302204999 25200 # Node ID a73514445065390ae70c44e1708971dd6fa2a6f0 # Parent 97763efc41f9b664cf6f7db653c9c3f51e50b358 remus: proper cleanup on checkpoint failure. While running remus, when an error occurs during checkpointing (e.g., timeouts on primary, failing to checkpoint network buffer or disk or even communication failure) the domU is sometimes left in suspended state on primary. Instead of blindly closing the checkpoint file handle, attempt to resume the domain before the close. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 97763efc41f9 -r a73514445065 tools/python/xen/lowlevel/checkpoint/checkpoint.c --- a/tools/python/xen/lowlevel/checkpoint/checkpoint.c Tue Apr 05 18:23:54 2011 +0100 +++ b/tools/python/xen/lowlevel/checkpoint/checkpoint.c Thu Apr 07 12:36:39 2011 -0700 @@ -80,6 +80,9 @@ { CheckpointObject* self = (CheckpointObject*)obj; + if (checkpoint_resume(&self->cps) < 0) + fprintf(stderr, "%s\n", checkpoint_error(&self->cps)); + checkpoint_close(&self->cps); Py_XDECREF(self->suspend_cb); diff -r 97763efc41f9 -r a73514445065 tools/python/xen/remus/save.py --- a/tools/python/xen/remus/save.py Tue Apr 05 18:23:54 2011 +0100 +++ b/tools/python/xen/remus/save.py Thu Apr 07 12:36:39 2011 -0700 @@ -158,9 +158,13 @@ self.checkpointer.open(self.vm.domid) self.checkpointer.start(self.fd, self.suspendcb, self.resumecb, self.checkpointcb, self.interval) - self.checkpointer.close() except xen.lowlevel.checkpoint.error, e: raise CheckpointError(e) + finally: + try: #errors in checkpoint close are not critical atm. + self.checkpointer.close() + except: + pass def _resume(self): """low-overhead version of XendDomainInfo.resumeDomain""" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Apr-08 15:55 UTC
Re: [Xen-devel] [PATCH] remus: proper cleanup on checkpoint failure [and 1 more messages]
Shriram Rajagopalan writes ("[Xen-devel] [PATCH] remus: proper cleanup on checkpoint failure"):> remus: proper cleanup on checkpoint failure.Shriram Rajagopalan writes ("[Xen-devel] [SPAM] [PATCH] remus: blackhole replication target"):> remus: blackhole replication targetThanks, I have applied both. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel