Antibozo
2008-Jul-31 21:30 UTC
[Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
Greetings. I''ve reviewed the list archives, particularly the posts from Zakk, on this subject, and found results similar to his. drbd provides a block-drbd script, but with full virtualization, at least on RHEL 5, this does not work; by the time the block script is run, the qemu-dm has already been started. Instead I''ve been simply musing the possibility of keeping the drbd devices in primary/primary state at all times. I''m concerned about a race condition, however, and want to ask if others have examined this alternative. I am thinking of a scenario where the vm is running on node A, and has a process that is writing to disk at full speed, and consequently the drbd device on the node B is lagging. If I perform a live migration from node A to B under this condition, the local device on node B might not be in sync at the time the vm is started on that node. Maybe. If I use drbd protocol C, theoretically at least, a sync on the device on node A shouldn''t return until node B is fully in sync. So I guess my main question is: during migration, does xend force a device sync on node A before the vm is started on node B? A secondary question I have (and this may be a question for the drbd folks as well) is: why is the block-drbd script necessary? I.e. why not simply leave the drbd primary/primary at all times--what benefit is there to marking the device secondary on the standby node? Or am I just very confused? Does anyone else have thoughts or experience on this matter? All responses are appreciated. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
nathan@robotics.net
2008-Jul-31 21:58 UTC
Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
I am running DRBD primary/primary on Centos 5.2 with CLVM and GFS with no problems. The only issue I have with live migration is that the arp takes 10 - 15 sec to get refreshed so you lose connectivity during that time. I have the problem with 3.0ish xen on Centos 5.2 as well as xen 3.2.1. Anyway, other then the ARP issue, I have this working in production with about two dozen DomUs. Note: If you want to use LVM for xen rather then files on GFS/LVM/DRBD you need to run the latest DRBD that supports max-bio-bvecs.><>Nathan Stratton CTO, BlinkMind, Inc. nathan at robotics.net nathan at blinkmind.com http://www.robotics.net http://www.blinkmind.com On Thu, 31 Jul 2008, Antibozo wrote:> Greetings. > > I''ve reviewed the list archives, particularly the posts from Zakk, on this > subject, and found results similar to his. drbd provides a block-drbd script, > but with full virtualization, at least on RHEL 5, this does not work; by the > time the block script is run, the qemu-dm has already been started. > > Instead I''ve been simply musing the possibility of keeping the drbd devices > in primary/primary state at all times. I''m concerned about a race condition, > however, and want to ask if others have examined this alternative. > > I am thinking of a scenario where the vm is running on node A, and has a > process that is writing to disk at full speed, and consequently the drbd > device on the node B is lagging. If I perform a live migration from node A to > B under this condition, the local device on node B might not be in sync at > the time the vm is started on that node. Maybe. > > If I use drbd protocol C, theoretically at least, a sync on the device on > node A shouldn''t return until node B is fully in sync. So I guess my main > question is: during migration, does xend force a device sync on node A before > the vm is started on node B? > > A secondary question I have (and this may be a question for the drbd folks as > well) is: why is the block-drbd script necessary? I.e. why not simply leave > the drbd primary/primary at all times--what benefit is there to marking the > device secondary on the standby node? > > Or am I just very confused? Does anyone else have thoughts or experience on > this matter? All responses are appreciated. > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Antibozo
2008-Jul-31 23:24 UTC
Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
On 2008-07-31 21:58, nathan@robotics.net wrote:> I am running DRBD primary/primary on Centos 5.2 with CLVM and GFS with > no problems. The only issue I have with live migration is that the arp > takes 10 - 15 sec to get refreshed so you lose connectivity during that > time. I have the problem with 3.0ish xen on Centos 5.2 as well as xen > 3.2.1.One can run a job on the vm to generate a packet every second or two to resolve this; ping in a loop should do it. My scenario doesn''t involve any clustered filesystem. I''m using phy: drbd devices as the backing for the vm, not files. As far as I understand things, a clustered filesystem shouldn''t be necessary, as long as the drbd devices are in sync at the moment migration occurs. But the question remains whether that condition is guaranteed, and I hope to hear from someone who knows the answer to that question...> Anyway, other then the ARP issue, I have this working in production with > about two dozen DomUs. > > Note: If you want to use LVM for xen rather then files on GFS/LVM/DRBD > you need to run the latest DRBD that supports max-bio-bvecs.I''m actually running drbd on top of LVM. But I''ll look into the max-bio-bvecs thing anyway out of curiosity. Thanks for the reply.> On Thu, 31 Jul 2008, Antibozo wrote: >> Greetings. >> >> I''ve reviewed the list archives, particularly the posts from Zakk, on >> this subject, and found results similar to his. drbd provides a >> block-drbd script, but with full virtualization, at least on RHEL 5, >> this does not work; by the time the block script is run, the qemu-dm >> has already been started. >> >> Instead I''ve been simply musing the possibility of keeping the drbd >> devices in primary/primary state at all times. I''m concerned about a >> race condition, however, and want to ask if others have examined this >> alternative. >> >> I am thinking of a scenario where the vm is running on node A, and has >> a process that is writing to disk at full speed, and consequently the >> drbd device on the node B is lagging. If I perform a live migration >> from node A to B under this condition, the local device on node B >> might not be in sync at the time the vm is started on that node. Maybe. >> >> If I use drbd protocol C, theoretically at least, a sync on the device >> on node A shouldn''t return until node B is fully in sync. So I guess >> my main question is: during migration, does xend force a device sync >> on node A before the vm is started on node B? >> >> A secondary question I have (and this may be a question for the drbd >> folks as well) is: why is the block-drbd script necessary? I.e. why >> not simply leave the drbd primary/primary at all times--what benefit >> is there to marking the device secondary on the standby node? >> >> Or am I just very confused? Does anyone else have thoughts or >> experience on this matter? All responses are appreciated._______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Antibozo
2008-Jul-31 23:43 UTC
Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
On 2008-07-31 23:24, Antibozo wrote:> On 2008-07-31 21:58, nathan@robotics.net wrote: >> I am running DRBD primary/primary on Centos 5.2 with CLVM and GFS with >> no problems. The only issue I have with live migration is that the arp >> takes 10 - 15 sec to get refreshed so you lose connectivity during >> that time. I have the problem with 3.0ish xen on Centos 5.2 as well as >> xen 3.2.1. > > One can run a job on the vm to generate a packet every second or two to > resolve this; ping in a loop should do it.Quick follow-up: arping -b -A *ip-address-of-vm* seems to work pretty well. I get a 2-3 second dropout if I''m running this during live migration. -- Jefferson Ogata : Internetworker, Antibozo <ogata@antibozo.net> http://www.antibozo.net/ogata/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
nathan@robotics.net
2008-Jul-31 23:58 UTC
Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
On Thu, 31 Jul 2008, Antibozo wrote:> On 2008-07-31 23:24, Antibozo wrote: >> On 2008-07-31 21:58, nathan@robotics.net wrote: >>> I am running DRBD primary/primary on Centos 5.2 with CLVM and GFS with no >>> problems. The only issue I have with live migration is that the arp takes >>> 10 - 15 sec to get refreshed so you lose connectivity during that time. I >>> have the problem with 3.0ish xen on Centos 5.2 as well as xen 3.2.1. >> >> One can run a job on the vm to generate a packet every second or two to >> resolve this; ping in a loop should do it. > > Quick follow-up: > > arping -b -A *ip-address-of-vm* > > seems to work pretty well. I get a 2-3 second dropout if I''m running this > during live migration.Odd, I am seeing 15 - 20 sec, how on earth does someone get 165ms that is talked about? I get the same delay if I am migrating over gig e or 20 gig infiniband. It also should not be my I/O subsystem as I am getting 400 MB writes and 550 MB reads.><>Nathan Stratton CTO, BlinkMind, Inc. nathan at robotics.net nathan at blinkmind.com http://www.robotics.net http://www.blinkmind.com _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Antibozo
2008-Aug-03 21:55 UTC
[Xen-users] Re: drbd 8 primary/primary and xen migration on RHEL 5
On 2008-07-31 21:30, Antibozo wrote:> I''ve reviewed the list archives, particularly the posts from Zakk, on > this subject, and found results similar to his. drbd provides a > block-drbd script, but with full virtualization, at least on RHEL 5, > this does not work; by the time the block script is run, the qemu-dm has > already been started.I''ve developed a workaround for all of this, in the form of a wrapper script for qemu-dm. This is trickier than it might seem at first blush, because of the way that xend uses signals to communicate with qemu-dm. The wrapper script can be used in the "model =" line of a vm definition, and will take care of assuring consistency of the drbd resource(s) for a vm during reboots, migration, etc. The script can be found here: http://www.antibozo.net/xen/qemu-dm.drbd Strategy is detailed in script comments. Please review these if you want details. The principle objective is prevention of split brain. If you want to use Xen on top of drbd for high availability, this is a decent first cut, as far as I can tell. Feedback is welcome.> Instead I''ve been simply musing the possibility of keeping the drbd > devices in primary/primary state at all times. I''m concerned about a > race condition, however, and want to ask if others have examined this > alternative.I''ve moved away from this strategy, and am keeping resources secondary when a vm isn''t using them. This enables the remote node to tell if a vm is already running on a drbd resource by inspecting the peer primary/secondary status (the wrapper script does this). This makes it difficult, though not impossible, for you to accidentally fire up a vm using a resource that is already in use by a vm on the remote node. I''ve also discovered that primary/primary mode is not actually needed, at least for HVM vms using Xen 3.0.3 as shipped on RHEL 5. The conventional wisdom was that primary/primary was necessary during migration, but with the appropriate wrapper around qemu-dm, we can wait for the peer to go secondary before going primary on the local node. One way you can still get yourself pretty hosed (if you''re determined to do so) is the following: - Start vm on node A. The wrapper makes the drbd resource primary, and the vm starts running. - Start vm on node B. This creates the vm instance, but the wrapper blocks waiting for the drbd resource on node A to be secondary. - Start a migration from node A to B. This freaks xend out since it already has a vm with the same name running (even though it isn''t actually running yet). In this scenario, you may end up having to reboot node B because the xen store gets crufty. But you still should never end up with a split brain condition. Obviously you could also get hosed if your nodes can''t talk to one another, and you start the same vm on both nodes. This is classic split brain. In this case, drbd should refuse to resync when drbd connectivity is restored, and you''ll have to kill one of the vm instances, invalidate the local drbd resource, and resync, after which things should be fine. I haven''t tested this scenario yet, so YMMV.> I am thinking of a scenario where the vm is running on node A, and has a > process that is writing to disk at full speed, and consequently the drbd > device on the node B is lagging. If I perform a live migration from node > A to B under this condition, the local device on node B might not be in > sync at the time the vm is started on that node. Maybe.I have done some testing of heavy disk i/o situations during live migration, and things appear to remain fully consistent. Note that the i/o stack of filesystem on top of LVM volume, on top of xen, on top of drbd, on top of LVM volume is not super fast. I see 10-20 MB/s with new block allocation on a 4-core PowerEdge 1950 using SAS disks (with one CPU allocated to the vm). So don''t plan on that particular architecture for your heavily used RDBMS.> If I use drbd protocol C, theoretically at least, a sync on the device > on node A shouldn''t return until node B is fully in sync. So I guess my > main question is: during migration, does xend force a device sync on > node A before the vm is started on node B?By all appearances (empirically), yes. And since this qemu-dm wrapper also waits for secondary state on the peer, and UpToDate state on the local copy, before actually invoking the real qemu-dm, I believe we are covered. -- Jefferson Ogata : Internetworker, Antibozo _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Antibozo
2008-Aug-03 22:00 UTC
Re: [Xen-users] drbd 8 primary/primary and xen migration on RHEL 5
On 2008-07-31 23:58, nathan@robotics.net wrote:> On Thu, 31 Jul 2008, Antibozo wrote: >> On 2008-07-31 23:24, Antibozo wrote: >>> One can run a job on the vm to generate a packet every second or two >>> to resolve this; ping in a loop should do it. >> >> Quick follow-up: >> >> arping -b -A *ip-address-of-vm* >> >> seems to work pretty well. I get a 2-3 second dropout if I''m running >> this during live migration. > > Odd, I am seeing 15 - 20 sec, how on earth does someone get 165ms that > is talked about? I get the same delay if I am migrating over gig e or 20 > gig infiniband. It also should not be my I/O subsystem as I am getting > 400 MB writes and 550 MB reads.I assume the 165ms statistic is actual downtime of the vm during live migration, and delays regarding network switching paths are a completely external matter. Are you sure the 15-20 seconds you''re seeing are actual downtime? If you run "while true ; do date ; sleep 1 ; done" on the vm while it''s migrating, do you see a 15-20 second dropout in the output (once it''s visible to you)? I wonder if your IP switches have some sort of arp flap limiting going on. You might try disabling spanning tree if that''s enabled. -- Jefferson Ogata : Internetworker, Antibozo _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users