thr3ads.net - Xen devel - [Xen-devel] backend block driver and domain migration blackout time [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Guzovsky, Eduard

2008-Aug-25 23:25 UTC

[Xen-devel] backend block driver and domain migration blackout time

Hi,

We are working on reducing blackout time during domain migrations in xen
3.1.0. We ran into a problem when backend block driver removes the
backend device without waiting for the main kernel thread created by the
driver to actually complete all outstanding io. This happens in the
final stages of the migration process when guest domain is being
destroyed.

static int blkback_remove(struct xenbus_device *dev) {
	struct backend_info *be = dev->dev.driver_data;

            ...
	if (kthread_remove(be))
		WPRINTK("BAD REMOVE REQUEST for %s\n", be->nodename);

            ... 

	backend_release(be);

	return 0;
}

If "kthread" is still running and holds "be->refcnt",
"backend_release()" returns after 

	if (!atomic_dec_and_test(&be->refcnt))
		return;

without reaching blkif_disconnect() to wait for io to complete. 

At this point backend disappears from sysfs (and I assume, from
xenstore) radar, so there is no way to determine if block io has
actually completed on the source before "unleashing" migrated domain
on
the target system. 

We added a little more "plumbing" to wait for kthread to finish io
before calling "backend_release()". I am new to xen and would like to
ask if this is actually a "safe" thing to do. It looks like
"backend_remove()" is called by the xenwatch thread. I just want to
make
sure that by hanging xenwatch thread at this point I am not creating
potential deadlock cases - in some disk io failure scenarios xenwatch
thread might get hung for a long period of time.

We also noticed that "unleashing" domain on the target is gated by the
destruction of the domain on the source. Destruction of domains with
lots of memory is a very time consuming process. To speed things up we
modified XendCheckpoint.py "save" function to do

    dominfo._releaseDevices(force = True)
    wait_for_outstanding_vbd_io_omplete() # "shutdown-done" in
xenstore
    unleash_the_target() 

just before the current

    dominfo.destroyDomain()
    dominfo.testDeviceComplete()
  
Both changes - the driver and xend - work fine in our testing. Being new
to xen I just want to double check that I am not missing something. The
complete patch is below.

Thanks,

-Ed

diff -ur a/drivers/xen/blkback/xenbus.c b/drivers/xen/blkback/xenbus.c
--- a/drivers/xen/blkback/xenbus.c	2008-08-24 12:24:56.000000000
-0400
+++ b/drivers/xen/blkback/xenbus.c	2008-08-24 16:05:07.000000000
-0400
@@ -38,6 +38,7 @@
 	char *mode;
 	int group_added;
 	char *nodename;
+	wait_queue_head_t wq;
 	atomic_t refcnt;
 	pid_t kthread_pid;
 	int shutdown_signalled;
@@ -220,7 +221,6 @@
 
 	blkif->remove_requested = 1;
 	wake_up_process(blkif->xenblkd);
-	blkif->xenblkd = NULL;
 
 	spin_unlock_irqrestore(&xenblkd_lock, flags);
 
@@ -296,6 +296,10 @@
 	if (kthread_remove(be))
 		WPRINTK("BAD REMOVE REQUEST for %s\n", be->nodename);
 
+        if (be->blkif) {
+            wait_event(be->wq, be->blkif->xenblkd == NULL);
+        }
+
 	down(&blkback_dev_sem);
 	xenvbd_sysfs_delif(dev);
 	dev->dev.driver_data = NULL;
@@ -311,9 +315,16 @@
  */
 void blkback_close(blkif_t *blkif)
 {
+        long flags;
+
 	blkif_disconnect(blkif);
 	vbd_sync(&blkif->vbd);
-	blkif->remove_requested = 0;
+
+        spin_lock_irqsave(&xenblkd_lock, flags);
+        blkif->xenblkd = NULL;
+        blkif->remove_requested = 0;
+        wake_up(&blkif->be->wq);
+        spin_unlock_irqrestore(&xenblkd_lock, flags);
 
 	down(&blkback_dev_sem);
 	if (blkif->be->dev)
@@ -386,6 +397,8 @@
 				 "allocating backend structure");
 		return -ENOMEM;
 	}
+
+	init_waitqueue_head(&be->wq);
 	be->dev = dev;
 	dev->dev.driver_data = be;
 	atomic_set(&be->refcnt, 1);

diff -ur ax/tools/python/xen/xend/XendCheckpoint.py
bx/tools/python/xen/xend/XendCheckpoint.py
--- ax/tools/python/xen/xend/XendCheckpoint.py	2008-08-25
17:18:14.000000000 -0400
+++ bx/tools/python/xen/xend/XendCheckpoint.py	2008-08-25
17:18:40.000000000 -0400
@@ -55,7 +55,7 @@
     return buf
 
 
-def save(fd, dominfo, network, live, dst, checkpoint=False):
+def save(fd, dominfo, network, live, dst, checkpoint=False, sock=None):
     write_exact(fd, SIGNATURE, "could not write guest state file:
signature")
 
     config = sxp.to_string(dominfo.sxpr())
@@ -124,8 +124,17 @@
         if checkpoint:
             dominfo.resumeDomain()
         else:
-            dominfo.destroyDomain()
-            dominfo.testDeviceComplete()
+            if sock is None:
+                dominfo.destroyDomain()
+                dominfo.testDeviceComplete()
+            else:
+                # handle migration - "Use the Force, Luke" 
+                dominfo._releaseDevices(force = True)
+                dominfo.testDeviceComplete(io_done = True)
+                sock.shutdown(2)
+                dominfo.destroyDomain()
+                dominfo.testDeviceComplete()
+
         try:
             dominfo.setName(domain_name)
         except VmError:
diff -ur ax/tools/python/xen/xend/XendDomain.py
bx/tools/python/xen/xend/XendDomain.py
--- ax/tools/python/xen/xend/XendDomain.py	2008-08-25
17:18:14.000000000 -0400
+++ bx/tools/python/xen/xend/XendDomain.py	2008-08-25
17:18:40.000000000 -0400
@@ -1242,7 +1242,8 @@
         try:
             sock.send("receive\n")
             sock.recv(80)
-            XendCheckpoint.save(sock.fileno(), dominfo, True, live,
dst)
+            XendCheckpoint.save(sock.fileno(), dominfo, True, live,
dst,
+                                sock=sock)
         finally:
             sock.close()
 
diff -ur ax/tools/python/xen/xend/XendDomainInfo.py
bx/tools/python/xen/xend/XendDomainInfo.py
--- ax/tools/python/xen/xend/XendDomainInfo.py	2008-08-25
17:18:14.000000000 -0400
+++ bx/tools/python/xen/xend/XendDomainInfo.py	2008-08-25
17:18:40.000000000 -0400
@@ -1336,7 +1336,7 @@
         if self.image:
             self.image.createDeviceModel()
 
-    def _releaseDevices(self, suspend = False):
+    def _releaseDevices(self, suspend = False, force = False):
         """Release all domain''s devices.  Nothrow
guarantee."""
         if suspend and self.image:
             self.image.destroy(suspend)
@@ -1347,7 +1347,7 @@
             for dev in t.list(devclass):
                 try:
                     log.debug("Removing %s", dev);
-                    self.destroyDevice(devclass, dev, False);
+                    self.destroyDevice(devclass, dev, force)
                 except:
                     # Log and swallow any exceptions in removal --
                     # there''s nothing more we can do.
@@ -1907,7 +1907,7 @@
         except:
             log.exception("Unwatching VM path failed.")
 
-    def testDeviceComplete(self):
+    def testDeviceComplete(self, io_done = False):
         """ For Block IO migration safety we must ensure that
         the device has shutdown correctly, i.e. all blocks are
         flushed to disk
@@ -1920,19 +1920,34 @@
             test = 0
             for i in ctrlr.deviceIDs():
                 be = "%s/%s" % (bep, i)
-                try:
-                    hs = xstransact.Read(be,"hotplug-status")
-                    if hs is None:
-                        # dir must have been deleted and recreated
-                        log.info("Dev %s deleted (dir exists)", i)
-                    else:
-                        # Hmm.. device really is still there
-                        log.info("Dev %s still exists, looping", i)
-                        test = 1
-                except:
-                    # exception reading directory - must be
-                    # gone now...
-                    log.info("Dev %s deleted", i)
+                if io_done:
+                    try:
+                        hs = xstransact.Read(be,"shutdown-done")
+                        if hs is None:
+                            # io is not completed yet
+                            log.info("Dev %s io not done, looping",
i)
+                            test = 1
+                        else:
+                            # 
+                            log.info("Dev %s io is done", i)
+                    except:
+                        # exception reading directory - must be
+                        # gone now...
+                        log.info("Dev %s deleted", i)
+                else:
+                    try:
+                        hs = xstransact.Read(be,"hotplug-status")
+                        if hs is None:
+                            # dir must have been deleted and recreated
+                            log.info("Dev %s deleted (dir exists)",
i)
+                        else:
+                            # Hmm.. device really is still there
+                            log.info("Dev %s still exists, looping",
i)
+                            test = 1
+                    except:
+                        # exception reading directory - must be
+                        # gone now...
+                        log.info("Dev %s deleted", i)
 
             if test == 0:
                 break

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-26 07:07 UTC

head link

[Xen-devel] Re: backend block driver and domain migration blackout time

On 26/8/08 00:25, "Guzovsky, Eduard"
<Eduard.Guzovsky@stratus.com> wrote:
> 
> We are working on reducing blackout time during domain migrations in xen
> 3.1.0. We ran into a problem when backend block driver removes the
> backend device without waiting for the main kernel thread created by the
> driver to actually complete all outstanding io. This happens in the
> final stages of the migration process when guest domain is being
> destroyed.
> 
> static int blkback_remove(struct xenbus_device *dev) {
> struct backend_info *be = dev->dev.driver_data;
> 
>             ...
> if (kthread_remove(be))
> WPRINTK("BAD REMOVE REQUEST for %s\n", be->nodename);
> 
>             ... 
> 
> backend_release(be);
> 
> return 0;
> }
I don''t recognise the code above from any version of our Linux
patchset. I''m
pretty sure that the races you mention, which might be present in the 3.1
Linux sparse tree, are fixed in our linux-2.6.18-xen.hg repository.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guzovsky, Eduard

2008-Aug-26 16:36 UTC

head link

[Xen-devel] RE: backend block driver and domain migration blackout time

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, August 26, 2008 3:08 AM
> To: Guzovsky, Eduard; xen-devel@lists.xensource.com
> Cc: Murray, Kimball
> Subject: Re: backend block driver and domain migration blackout time
> 
> On 26/8/08 00:25, "Guzovsky, Eduard"
<Eduard.Guzovsky@stratus.com>
wrote:> 
> >
> > We are working on reducing blackout time during domain migrations in
xen> > 3.1.0. We ran into a problem when backend block driver removes the
> > backend device without waiting for the main kernel thread created by
the> > driver to actually complete all outstanding io. This happens in the
> > final stages of the migration process when guest domain is being
> > destroyed.
> >
> > static int blkback_remove(struct xenbus_device *dev) {
> > struct backend_info *be = dev->dev.driver_data;
> >
> >             ...
> > if (kthread_remove(be))
> > WPRINTK("BAD REMOVE REQUEST for %s\n", be->nodename);
> >
> >             ...
> >
> > backend_release(be);
> >
> > return 0;
> > }
> 
> I don''t recognise the code above from any version of our Linux
patchset.> I''m
> pretty sure that the races you mention, which might be present in the
3.1> Linux sparse tree, are fixed in our linux-2.6.18-xen.hg repository.
> 
>  -- Keir
Hi Keir,

Thank you for the quick response. You are correct linux-2.6.18-xen.hg
does not have this race condition. It is a problem in CentOS
distribution that we use.

What about the second part of the question - modifying xend to reduce
blackout time during domain migration? In xen 3.1.0 "unleashing"
domain
on the target is gated by the destruction of the domain on the source.
Destruction of domains with lots of memory is a very time consuming
process. To speed things up we added a separate "migration" case in
XendCheckpoint.py "save" function that does

    dominfo._releaseDevices(force = True)
    wait_for_outstanding_vbd_io_complete() # "shutdown-done" in
xenstore
    unleash_the_target() 

just before the current

    dominfo.destroyDomain()
    dominfo.testDeviceComplete()
  
The complete xend patch is below. Works fine in our testing but is this
a safe thing to do?

Thanks,

-Ed

diff -ur ax/tools/python/xen/xend/XendCheckpoint.py
bx/tools/python/xen/xend/XendCheckpoint.py
--- ax/tools/python/xen/xend/XendCheckpoint.py	2008-08-25
17:18:14.000000000 -0400
+++ bx/tools/python/xen/xend/XendCheckpoint.py	2008-08-25
17:18:40.000000000 -0400
@@ -55,7 +55,7 @@
     return buf
 
 
-def save(fd, dominfo, network, live, dst, checkpoint=False):
+def save(fd, dominfo, network, live, dst, checkpoint=False, sock=None):
     write_exact(fd, SIGNATURE, "could not write guest state file:
signature")
 
     config = sxp.to_string(dominfo.sxpr()) @@ -124,8 +124,17 @@
         if checkpoint:
             dominfo.resumeDomain()
         else:
-            dominfo.destroyDomain()
-            dominfo.testDeviceComplete()
+            if sock is None:
+                dominfo.destroyDomain()
+                dominfo.testDeviceComplete()
+            else:
+                # handle migration - "Use the Force, Luke" 
+                dominfo._releaseDevices(force = True)
+                dominfo.testDeviceComplete(io_done = True)
+                sock.shutdown(2)
+                dominfo.destroyDomain()
+                dominfo.testDeviceComplete()
+
         try:
             dominfo.setName(domain_name)
         except VmError:
diff -ur ax/tools/python/xen/xend/XendDomain.py
bx/tools/python/xen/xend/XendDomain.py
--- ax/tools/python/xen/xend/XendDomain.py	2008-08-25
17:18:14.000000000 -0400
+++ bx/tools/python/xen/xend/XendDomain.py	2008-08-25
17:18:40.000000000 -0400
@@ -1242,7 +1242,8 @@
         try:
             sock.send("receive\n")
             sock.recv(80)
-            XendCheckpoint.save(sock.fileno(), dominfo, True, live,
dst)
+            XendCheckpoint.save(sock.fileno(), dominfo, True, live,
dst,
+                                sock=sock)
         finally:
             sock.close()
 
diff -ur ax/tools/python/xen/xend/XendDomainInfo.py
bx/tools/python/xen/xend/XendDomainInfo.py
--- ax/tools/python/xen/xend/XendDomainInfo.py	2008-08-25
17:18:14.000000000 -0400
+++ bx/tools/python/xen/xend/XendDomainInfo.py	2008-08-25
17:18:40.000000000 -0400
@@ -1336,7 +1336,7 @@
         if self.image:
             self.image.createDeviceModel()
 
-    def _releaseDevices(self, suspend = False):
+    def _releaseDevices(self, suspend = False, force = False):
         """Release all domain''s devices.  Nothrow
guarantee."""
         if suspend and self.image:
             self.image.destroy(suspend) @@ -1347,7 +1347,7 @@
             for dev in t.list(devclass):
                 try:
                     log.debug("Removing %s", dev);
-                    self.destroyDevice(devclass, dev, False);
+                    self.destroyDevice(devclass, dev, force)
                 except:
                     # Log and swallow any exceptions in removal --
                     # there''s nothing more we can do.
@@ -1907,7 +1907,7 @@
         except:
             log.exception("Unwatching VM path failed.")
 
-    def testDeviceComplete(self):
+    def testDeviceComplete(self, io_done = False):
         """ For Block IO migration safety we must ensure that
         the device has shutdown correctly, i.e. all blocks are
         flushed to disk
@@ -1920,19 +1920,34 @@
             test = 0
             for i in ctrlr.deviceIDs():
                 be = "%s/%s" % (bep, i)
-                try:
-                    hs = xstransact.Read(be,"hotplug-status")
-                    if hs is None:
-                        # dir must have been deleted and recreated
-                        log.info("Dev %s deleted (dir exists)", i)
-                    else:
-                        # Hmm.. device really is still there
-                        log.info("Dev %s still exists, looping", i)
-                        test = 1
-                except:
-                    # exception reading directory - must be
-                    # gone now...
-                    log.info("Dev %s deleted", i)
+                if io_done:
+                    try:
+                        hs = xstransact.Read(be,"shutdown-done")
+                        if hs is None:
+                            # io is not completed yet
+                            log.info("Dev %s io not done, looping",
i)
+                            test = 1
+                        else:
+                            # 
+                            log.info("Dev %s io is done", i)
+                    except:
+                        # exception reading directory - must be
+                        # gone now...
+                        log.info("Dev %s deleted", i)
+                else:
+                    try:
+                        hs = xstransact.Read(be,"hotplug-status")
+                        if hs is None:
+                            # dir must have been deleted and recreated
+                            log.info("Dev %s deleted (dir exists)",
i)
+                        else:
+                            # Hmm.. device really is still there
+                            log.info("Dev %s still exists, looping",
i)
+                            test = 1
+                    except:
+                        # exception reading directory - must be
+                        # gone now...
+                        log.info("Dev %s deleted", i)
 
             if test == 0:
                 break
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-26 16:56 UTC

head link

[Xen-devel] Re: backend block driver and domain migration blackout time

On 26/8/08 17:36, "Guzovsky, Eduard"
<Eduard.Guzovsky@stratus.com> wrote:
>     dominfo._releaseDevices(force = True)
>     wait_for_outstanding_vbd_io_complete() # "shutdown-done" in
xenstore
>     unleash_the_target()
> 
> just before the current
> 
>     dominfo.destroyDomain()
>     dominfo.testDeviceComplete()
>   
> The complete xend patch is below. Works fine in our testing but is this
> a safe thing to do?
As long as domain''s execution is quiesced (which it obviously is since
it
will have responded to the suspend request and the state record has been
propagated to the target machine) and I/O is quiesced (well, block I/O at
least) then yes, it should logically be safe.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guzovsky, Eduard

2008-Aug-26 17:01 UTC

head link

[Xen-devel] RE: backend block driver and domain migration blackout time

Thanks,

-Ed

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, August 26, 2008 12:57 PM
> To: Guzovsky, Eduard; xen-devel@lists.xensource.com
> Cc: Murray, Kimball
> Subject: Re: backend block driver and domain migration blackout time
> 
> On 26/8/08 17:36, "Guzovsky, Eduard"
<Eduard.Guzovsky@stratus.com>
wrote:> 
> >     dominfo._releaseDevices(force = True)
> >     wait_for_outstanding_vbd_io_complete() # "shutdown-done"
in
xenstore> >     unleash_the_target()
> >
> > just before the current
> >
> >     dominfo.destroyDomain()
> >     dominfo.testDeviceComplete()
> >
> > The complete xend patch is below. Works fine in our testing but is
this> > a safe thing to do?
> 
> As long as domain''s execution is quiesced (which it obviously is
since
it> will have responded to the suspend request and the state record has
been> propagated to the target machine) and I/O is quiesced (well, block I/O
at> least) then yes, it should logically be safe.
> 
>  -- Keir
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2008 - backend block driver and domain migration blackout time

[Xen-devel] backend block driver and domain migration blackout time

[Xen-devel] Re: backend block driver and domain migration blackout time

[Xen-devel] RE: backend block driver and domain migration blackout time

[Xen-devel] Re: backend block driver and domain migration blackout time

[Xen-devel] RE: backend block driver and domain migration blackout time