Shriram Rajagopalan
2011-May-17 03:54 UTC
[Xen-devel] [PATCH] remus: support DRBD disk backends
# HG changeset patch # User Shriram Rajagopalan <rshriram@cs.ubc.ca> # Date 1305604364 25200 # Node ID 666dd2576bc41ccb9f4cb74c54bd16a6425e7b14 # Parent 5fe52d24d88f3beea273d48c18d9e5ac8858ccac remus: support DRBD disk backends DRBD disk backends can be used instead of tapdisk backends for Remus. This requires a Remus style disk replication protocol (asynchronous replication with output buffering at backup), that is not available in standard DRBD code. A modified version that supports this new replication protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus Use of DRBD disk backends provides a means for efficient resynchronization of data after the crashed machine comes back online. Since DRBD allows for online resynchronization, a DRBD backed Remus VM does not have to be stopped or shutdown while the disks are resynchronizing. Once resynchronization is complete, Remus can be started at will. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> diff -r 5fe52d24d88f -r 666dd2576bc4 tools/python/xen/remus/device.py --- a/tools/python/xen/remus/device.py Mon May 16 20:47:04 2011 -0700 +++ b/tools/python/xen/remus/device.py Mon May 16 20:52:44 2011 -0700 @@ -2,7 +2,7 @@ # # Coordinates with devices at suspend, resume, and commit hooks -import os, re +import os, re, fcntl import netlink, qdisc, util @@ -30,22 +30,51 @@ is paused between epochs. """ FIFODIR = ''/var/run/tap'' + SEND_CHECKPOINT = 20 + WAIT_CHECKPOINT_ACK = 30 def __init__(self, disk): # look up disk, make sure it is tap:buffer, and set up socket # to request commits. self.ctlfd = None + self.msgfd = None + self.is_drbd = False + self.ackwait = False - if not disk.uname.startswith(''tap:remus:'') and not disk.uname.startswith(''tap:tapdisk:remus:''): + if disk.uname.startswith(''tap:remus:'') or disk.uname.startswith(''tap:tapdisk:remus:''): + fifo = re.match("tap:.*(remus.*)\|", disk.uname).group(1).replace('':'', ''_'') + absfifo = os.path.join(self.FIFODIR, fifo) + absmsgfifo = absfifo + ''.msg'' + + self.installed = False + self.ctlfd = open(absfifo, ''w+b'') + self.msgfd = open(absmsgfifo, ''r+b'') + elif disk.uname.startswith(''drbd:''): + #get the drbd device associated with this resource + drbdres = re.match("drbd:(.*)", disk.uname).group(1) + drbddev = util.runcmd("drbdadm sh-dev %s" % drbdres).rstrip() + + #check for remus supported drbd installation + rconf = util.runcmd("drbdsetup %s show" % drbddev) + if rconf.find(''protocol D;'') == -1: + raise ReplicatedDiskException(''Remus support for DRBD disks requires the '' + ''resources to operate in protocol D. Please make '' + ''sure that you have installed the remus supported DRBD '' + ''version from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus '' + ''and enabled protocol D in the resource config'') + + #check if resource is in connected state + cstate = util.runcmd("drbdadm cstate %s" % drbdres).rstrip() + if cstate != ''Connected'': + raise ReplicatedDiskException(''DRBD resource %s is not in connected state!'' + % drbdres) + + #open a handle to the resource so that we could issue chkpt ioctls + self.ctlfd = open(drbddev, ''r'') + self.is_drbd = True + else: raise ReplicatedDiskException(''Disk is not replicated: %s'' % str(disk)) - fifo = re.match("tap:.*(remus.*)\|", disk.uname).group(1).replace('':'', ''_'') - absfifo = os.path.join(self.FIFODIR, fifo) - absmsgfifo = absfifo + ''.msg'' - - self.installed = False - self.ctlfd = open(absfifo, ''w+b'') - self.msgfd = open(absmsgfifo, ''r+b'') def __del__(self): self.uninstall() @@ -56,12 +85,24 @@ self.ctlfd = None def postsuspend(self): - os.write(self.ctlfd.fileno(), ''flush'') + if not self.is_drbd: + os.write(self.ctlfd.fileno(), ''flush'') + elif not self.ackwait: + if (fcntl.ioctl(self.ctlfd.fileno(), self.SEND_CHECKPOINT, 0) > 0): + self.ackwait = False + else: + self.ackwait = True + + def preresume(self): + if self.is_drbd and self.ackwait: + fcntl.ioctl(self.ctlfd.fileno(), self.WAIT_CHECKPOINT_ACK, 0) + self.ackwait = False def commit(self): - msg = os.read(self.msgfd.fileno(), 4) - if msg != ''done'': - print ''Unknown message: %s'' % msg + if not self.is_drbd: + msg = os.read(self.msgfd.fileno(), 4) + if msg != ''done'': + print ''Unknown message: %s'' % msg ### Network _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-May-20 17:21 UTC
Re: [Xen-devel] [PATCH] remus: support DRBD disk backends
Shriram Rajagopalan writes ("[Xen-devel] [PATCH] remus: support DRBD disk backends"):> remus: support DRBD disk backends > > DRBD disk backends can be used instead of tapdisk backends for Remus. > This requires a Remus style disk replication protocol (asynchronous > replication with output buffering at backup), that is not available in > standard DRBD code. A modified version that supports this new replication > protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remusNormally, "drbd:" disk strings would be handled by /etc/xen/scripts/block-drbd, I think ? Why does remus need to do something different ? The block device handling in libxl is in a bit of a state of flux but perhaps it would be useful to start thinking about whether remus could use it ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-May-21 01:23 UTC
Re: [Xen-devel] [PATCH] remus: support DRBD disk backends
On Fri, May 20, 2011 at 10:21 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Shriram Rajagopalan writes ("[Xen-devel] [PATCH] remus: support DRBD disk > backends"): > > remus: support DRBD disk backends > > > > DRBD disk backends can be used instead of tapdisk backends for Remus. > > This requires a Remus style disk replication protocol (asynchronous > > replication with output buffering at backup), that is not available in > > standard DRBD code. A modified version that supports this new replication > > protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus > > Normally, "drbd:" disk strings would be handled by > /etc/xen/scripts/block-drbd, I think ? Why does remus need to do > something different ? > > Yep. So, remus does use that. But DRBD does not have the replicationprotocol that remus needs. Its asynchronous/synchronous replication does not fit Remus'' requirement''s of a checkpoint based replication, wherein, the backup has to "buffer" the disk writes in memory and release them only when the checkpoint is being made. So, the DRBD version supplied in the link above supports a new protocol D, that does just that. Now, in order to send the tell to the backup that it can "flush" the buffered disk writes to disk, some sort signalling has to be done from remus, after VM is suspended. And this patch does just that.> The block device handling in libxl is in a bit of a state of flux but > perhaps it would be useful to start thinking about whether remus could > use it ? > > Dont understand. I do realize that libxl style of block device handling isnot stabilized, and it doesnt support DRBD at the moment. But I am working on remus patches for libxl support :). I ll start a separate discussion soon on that, but I wanted to get this one out of the way first. thanks shriram> Thanks, > Ian. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-May-22 17:21 UTC
Re: [Xen-devel] [PATCH] remus: support DRBD disk backends
On Fri, May 20, 2011 at 9:23 PM, Shriram Rajagopalan <rshriram@cs.ubc.ca>wrote:> On Fri, May 20, 2011 at 10:21 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote: > >> Shriram Rajagopalan writes ("[Xen-devel] [PATCH] remus: support DRBD disk >> backends"): >> > remus: support DRBD disk backends >> > >> > DRBD disk backends can be used instead of tapdisk backends for Remus. >> > This requires a Remus style disk replication protocol (asynchronous >> > replication with output buffering at backup), that is not available in >> > standard DRBD code. A modified version that supports this new >> replication >> > protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus >> >> Normally, "drbd:" disk strings would be handled by >> /etc/xen/scripts/block-drbd, I think ? Why does remus need to do >> something different ? >> >> Yep. So, remus does use that. But DRBD does not have the replication > protocol > that remus needs. Its asynchronous/synchronous replication does not > fit Remus'' > requirement''s of a checkpoint based replication, wherein, the backup has to > "buffer" > the disk writes in memory and release them only when the checkpoint is > being made. > > So, the DRBD version supplied in the link above supports a new protocol D, > that does > just that. Now, in order to send the tell to the backup that it can "flush" > the buffered > disk writes to disk, some sort signalling has to be done from remus, after > VM is > suspended. And this patch does just that. > >> The block device handling in libxl is in a bit of a state of flux but >> perhaps it would be useful to start thinking about whether remus could >> use it ? >> >> Dont understand. I do realize that libxl style of block device handling is > not stabilized, > and it doesnt support DRBD at the moment. But I am working on remus patches > for > libxl support :). I ll start a separate discussion soon on that, but I > wanted to get this > one out of the way first. > > thanks > shriram > >> Thanks, >> Ian. >> >> > Ian, is there anything else you would like to know before pulling in thispatch? shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-May-26 14:04 UTC
Re: [Xen-devel] [PATCH] remus: support DRBD disk backends
Shriram Rajagopalan writes ("[Xen-devel] [PATCH] remus: support DRBD disk backends"):> remus: support DRBD disk backendsCommitted, thanks. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel