thr3ads.net - Xen devel - [Xen-devel] Shared disk corruption caused by migration [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Dutton, Jeff

2006-Aug-21 16:51 UTC

[Xen-devel] Shared disk corruption caused by migration

We are seeing a disk corruption problem when migrating a VM between two
nodes that are both active writers of a shared storage block device.
 
The corruption seems to be caused by a lack of synchronization between
the migration source and destination regarding outstanding block write
requests.  The failing scenario is as follows:
 
1)  The VM has block write A in progress on the source node X at the
time it is being migrated.
2)  The blkfront driver requeues A on the destination node Y after
migration.  Request A gets completed immediately, because the shared
storage already has a request in flight to the same block (from X), so
it ignores the new request.
3)  New block write request A'' is made from Y, now that the VM is
running, to the same block number as A.  Request A'' gets completed
immediately for the same reasons as in #2.
 
The corruption we are seeing is that the block contains the data A, not
A'' as the VM expects.  The problem is that the shared storage
doesn''t
guarantee the outcome of the concurrent writes X->A and Y->A.  It is
choosing to ignore and immediately complete the second request, which I
understand is one of the acceptable strategies for managing concurrent
writes to the same block.  That behavior is fine when the redundant
request A is being ignored, but when the new request A'' occurs, we get
corruption.
 
The problem only shows up under heavy disk load (e.g the Bonnie
benchmark) while migrating, so most users probably haven''t seen it.
 

If I understand this correctly though, this could affect anyone using
shared block storage with dual active writers and live migration.  When
we run with a single active writer and then move the active writer to
the destination node, all outstanding requests get flushed in the
background and we don''t see this problem.
 
The blkfront xenbus_driver doesn''t have a "suspend" method. 
I was going
to add one to flush the outstanding requests from the migration source
to fix the problem.  Or maybe we can cancel all outstanding I/O requests
to eliminate the concurrency between the two nodes.  Does the Linux
block I/O interface allow the canceling of requests?
 
Anyone else seeing this problem?  Any other ideas for solutions?
 
Thanks,
Jeff
 
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Aug-21 19:58 UTC

head link

RE: [Xen-devel] Shared disk corruption caused by migration

> The blkfront xenbus_driver doesn''t have a "suspend"
method.  I was
going to> add one to flush the outstanding requests from the migration source to
fix> the problem.  Or maybe we can cancel all outstanding I/O requests to
> eliminate the concurrency between the two nodes.  Does the Linux block
I/O> interface allow the canceling of requests?
> 
> Anyone else seeing this problem?  Any other ideas for solutions?
There''s already work in progress on this.

The simplest thing to do is to wait until the backend queues are empty
before signalling the destination host to unpause the relocated domain.
However, this would add to migration downtime. It would be nice if we
could quickly cancel the IOs queued at the original host, but Linux
doesn''t have a good mechanism for this.

For targets that support fencing it''s possible to quickly and
synchronously fence the original host. For other targets, we need to be
a bit cunning to minimize downtime: we can actually start running the VM
on the destination host before we''ve had the ''all queues
empty'' message
from the source host. We just have to be careful to make sure that we
don''t issue any writes to blocks that also potentially still have
writes
pending on them in the source host. If such a write occurs, we have to
stall issuing of the write until we receive the ''all queues
empty'' from
the source host. However, such conflicting writes are actually pretty
unusual, so the majority of relocations won''t incur the stall. 

Stay tuned for a patch.

Ian
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2006 - Shared disk corruption caused by migration

[Xen-devel] Shared disk corruption caused by migration

RE: [Xen-devel] Shared disk corruption caused by migration