Hello list, I''m reading a lot about xen stub-domains, but I''m wondering if I can use a linux stubdom to serve a "transformed" block device for the according domU. The wiki page(s) and the stubdom-directory in the source code leaves a lot of questions to me I''m hoping someone can answer me here. So my question are: * What are the requirements to run linux inside a stubdom? Is a current pvops kernel enough or has the linux kernel to be modified for a stubdom? If yes, I would prepare a kernel and a minimal rootfs within an initrd to setup my blockdevice for the domU. * How can I offer a block device (created within the stubdom) from the stubdom to the domU? Are there any docs how to configure this? * If the above is not possible, how could I offer a blockdevice from one domU to another domU as block device? Are there any docs how to do this? What I''m trying to do: * In my case, I make logical volumes available on all hosts with the same path on each host. So I can assemble a software raid1 where each device lives on a different server while not loosing the possibility of live migration. * Configure a domU with two block devices, the according (linux) stubdomu assembles a software raid1 (linux md device) and presents this md device to the domU. So the domU hasn''t to handle anything with the software raid1 but has ONE redundant block device for its usage. * I have two use cases, one is a HVM domU, where something like windows is running and because the lack of (good) software raid1 I use the software raid1 of linux for baking the block device inside the stubdom. The other use case is a PV domU where the admin of the virtualization environment is not the admin of the domU and therefore isn''t able to manage the software raid1 inside the domU. So the stubdom could be used to manage the software raid1 without interfering within the domU. -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hi, On 2013-01-29 16:46:40 Markus Hochholdinger wrote:> [...] > What I''m trying to do: > * In my case, I make logical volumes available on all hosts with the same path > on each host. So I can assemble a software raid1 where each device lives on > a different server while not loosing the possibility of live migration. > * Configure a domU with two block devices, the according (linux) stubdomu > assembles a software raid1 (linux md device) and presents this md device > to the domU. So the domU hasn''t to handle anything with the software raid1 > but has ONE redundant block device for its usage. > * I have two use cases, one is a HVM domU, where something like windows is > running and because the lack of (good) software raid1 I use the software > raid1 of linux for baking the block device inside the stubdom. > The other use case is a PV domU where the admin of the virtualization > environment is not the admin of the domU and therefore isn''t able to manage > the software raid1 inside the domU. So the stubdom could be used to manage > the software raid1 without interfering within the domU.Out of curiosity: Is there anything I am missing as to what you''re trying to achieve cannot be done in dom0 alone? - I suppose you could create an LVM2 Mirror above the remote volumes (assuming using iSCSI volumes as LVM2 PVs is possible). - Did you consider DRBD? Seems to me it provides what you need. Sorry tho I cannot help you with stubdoms. - peter.
On Tue, 2013-01-29 at 15:46 +0000, Markus Hochholdinger wrote:> Hello list, > > I''m reading a lot about xen stub-domains, but I''m wondering if I can use a > linux stubdom to serve a "transformed" block device for the according domU. > The wiki page(s) and the stubdom-directory in the source code leaves a lot of > questions to me I''m hoping someone can answer me here.I think the thing you are looking for is a "driver domain" rather than a stubdomain, http://wiki.xen.org/wiki/Driver_Domain. You''ll likely find more useful info if you google for that rather than stubdomain.> So my question are: > * What are the requirements to run linux inside a stubdom? Is a current pvops > kernel enough or has the linux kernel to be modified for a stubdom? If yes, > I would prepare a kernel and a minimal rootfs within an initrd to setup my > blockdevice for the domU.You can use Linux as a block driver storage domain, yes.> * How can I offer a block device (created within the stubdom) from the stubdom > to the domU? Are there any docs how to configure this? > * If the above is not possible, how could I offer a blockdevice from one domU > to another domU as block device? Are there any docs how to do this?Since a driver domain is also a domU (just one which happens to provide services to other domains) these are basically the same question. People more often do this with network driver domains but block ought to be possible too (although there may be a certain element of having to take the pieces and build something yourself). Essentially you just need to a) make the block device bits available in the driver domain b) run blkback (or some other block backend) in the driver domain and c) tell the toolstack that a particular device is provided by the driver domain when you build the guest. For a) one would usually use PCI passthrough to pass a storage controller to the driver domain and use the regular drivers in there. But you could also use e.g. iSCSI or NFS (I guess). If you want to also use this controller for dom0''s disks then that''s a bit more complex... For b) that''s just a case of compiling in the appropriate driver and installing the appropriate hotplug scripts in the domain. For c) I''m not entirely sure how you do that with either xend or xl in practice. I know there have been some patches on xen-devel not so long ago to improve things for xl support of disk driver domains. It possible that you might need to hack the toolstack a bit to get this to work, and depending on how and when the disk images are constructed you may need some out of band communication between the toolstack domain and driver domain to actually create the underlying devices. The biggest problem I can see is supporting Windows HVM, since the device model also needs to have access to the disk in order to provide the emulated devices (at least initially, hopefully you have PV drivers). The usual way to do this is to attach a PV device to the domain running the device model where the backend is supported by the driver domain as well. Again you might need to hack up a few things to get this working.> What I''m trying to do: > * In my case, I make logical volumes available on all hosts with the same path > on each host. So I can assemble a software raid1 where each device lives on > a different server while not loosing the possibility of live migration. > * Configure a domU with two block devices, the according (linux) stubdomu > assembles a software raid1 (linux md device) and presents this md device > to the domU. So the domU hasn''t to handle anything with the software raid1 > but has ONE redundant block device for its usage. > * I have two use cases, one is a HVM domU, where something like windows is > running and because the lack of (good) software raid1 I use the software > raid1 of linux for baking the block device inside the stubdom. > The other use case is a PV domU where the admin of the virtualization > environment is not the admin of the domU and therefore isn''t able to manage > the software raid1 inside the domU. So the stubdom could be used to manage > the software raid1 without interfering within the domU. > >
Hi, Am 29.01.2013 um 17:15 Uhr schrieb "Peter Gansterer" <peter.gansterer@paradigma.net>:> On 2013-01-29 16:46:40 Markus Hochholdinger wrote:[..]> Out of curiosity: > Is there anything I am missing as to what you''re trying to achieve cannot > be done in dom0 alone? - I suppose you could create an LVM2 Mirror above > the remote volumes (assuming using iSCSI volumes as LVM2 PVs is possible). > - Did you consider DRBD? Seems to me it provides what you need.well, if I create/assemble a md device in one dom0 I can''t live migrate the domU. If I create/assemble the md device simultaneously on the destination dom0 I have potentially data corruption. If I use drbd the performance is not that good and I''m limited to two dom0s (ok, with newer drbd I can have multiple slaves without stacking). And I have the problem of split brain situations. As my tests have shown I get the most out of the hardware if I use a software raid1 inside linux domUs. The local logical volumes are exported over iscsi to the other dom0s and all logical volumes have on each dom0 the same symlink in /dev. The other reason is that I have no problems with split brain. If the domU doesn''t run the software raid1 also doesn''t run. I have the setup with linux domUs and software raid1 inside the domUs successfull in production since 2006 but it has the limit that the software raid1 has to be managed inside the domUs and now I''m searching for solutions where the software raid1 is not inside the domU but very near of the domU like in a stubdom. -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hello, Am 29.01.2013 um 17:36 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>:> On Tue, 2013-01-29 at 15:46 +0000, Markus Hochholdinger wrote: > > I''m reading a lot about xen stub-domains, but I''m wondering if I can use > > a linux stubdom to serve a "transformed" block device for the according > > domU. The wiki page(s) and the stubdom-directory in the source code > > leaves a lot of questions to me I''m hoping someone can answer me here. > I think the thing you are looking for is a "driver domain" rather than a > stubdomain, http://wiki.xen.org/wiki/Driver_Domain. You''ll likely find > more useful info if you google for that rather than stubdomain.a driver domain is a great thing, but I''m wondering if I can live migrate a domU inclusive its driver domU? In the domU config I have to use the $domid of the driver domain. This $domid is probably wrong after a live migration of the domU and I have to migrate the driver domU in the same time while migrating the domU. If the driver domU is ment to stay on one dom0 there''s no difference between doing this in dom0, but for me this doesn''t work. Do you know how I can live migrate a domU which depends on a driver domU? How can I migrate the driver domU? For my understanding the block device has to be there on the destination dom0 before live migration begins but is also used on the source dom0 from the migrating, but still running, domU. Can I combine a driver domU to a normal domU like I can combine a stubdom with a normal domU? I thought a stubdom live migrates with its domU so you don''t have to worry that the driver domU live migrates while the according normal domU migrates.> > So my question are: > > * What are the requirements to run linux inside a stubdom? Is a current > > pvops > > kernel enough or has the linux kernel to be modified for a stubdom? If > > yes, I would prepare a kernel and a minimal rootfs within an initrd to > > setup my blockdevice for the domU. > You can use Linux as a block driver storage domain, yes.OK, I see. But still, would it be possible to run linux in a stub-domain? I''ve read e.g. http://blog.xen.org/index.php/2012/12/12/linux-stub-domain/ which describes this, but I''m unsure if this will be (or is already) supported by current xen?> > > * How can I offer a block device (created within the stubdom) from the > > stubdom > > > > to the domU? Are there any docs how to configure this? > > > > * If the above is not possible, how could I offer a blockdevice from one > > domU > > > > to another domU as block device? Are there any docs how to do this? > Since a driver domain is also a domU (just one which happens to provide > services to other domains) these are basically the same question. People > more often do this with network driver domains but block ought to be > possible too (although there may be a certain element of having to take > the pieces and build something yourself).Is live migration possible with this driver domUs? What requirements are needed that I can live migrate a domU which depends on a driver domU?> Essentially you just need to a) make the block device bits available in > the driver domain b) run blkback (or some other block backend) in the > driver domain and c) tell the toolstack that a particular device is > provided by the driver domain when you build the guest.Yes, I would provide two block devices (logical volumes) into the driver domU, create there a software raid1 device and make the md device available with blkback. I would do this for each domU so I can live migrate domUs independent. The driver domU only needs a kernel and a initrd with a rootfs filled with enough to build the md device and export it with blkback. But how can I address this exported block device? As far as I''ve seen I need the $domid of the driver domain in my config file of the domU, or am I missing something?> For a) one would usually use PCI passthrough to pass a storage > controller to the driver domain and use the regular drivers in there. > But you could also use e.g. iSCSI or NFS (I guess). If you want to also > use this controller for dom0''s disks then that''s a bit more complex...Because I''d like to "migrate" the driver domain with my normal domU I wouldn''t do any pci passthrough but only provide the logical volumes for baking the block device of one domU.> For b) that''s just a case of compiling in the appropriate driver and > installing the appropriate hotplug scripts in the domain. > For c) I''m not entirely sure how you do that with either xend or xl in > practice. I know there have been some patches on xen-devel not so long > ago to improve things for xl support of disk driver domains. It possible > that you might need to hack the toolstack a bit to get this to work, and > depending on how and when the disk images are constructed you may need > some out of band communication between the toolstack domain and driver > domain to actually create the underlying devices.I''ll test a driver domain so I can test what works for me and what not.> The biggest problem I can see is supporting Windows HVM, since the > device model also needs to have access to the disk in order to provide > the emulated devices (at least initially, hopefully you have PV > drivers). The usual way to do this is to attach a PV device to the > domain running the device model where the backend is supported by the > driver domain as well. Again you might need to hack up a few things to > get this working.So this would be a dependency that the driver domain is started before the stubdom with qemu. In some subdom-startup script I saw the parameter "target" while creating the stubdom. Is this a possibility to combine two domUs? As far as I understand with a driver domU I need the $domid of the driver domU in my config, so this is the connection between the two domUs. But with stub-domains I haven''t understand how data flows between stubdom and domU and because I''ve seen a lot of nice little pictures describing that I/O flows between domU over stubdom to dom0 and backwards I thought stub-domains would be the way to go. Many thanks so far for your informations.> > What I''m trying to do: > > * In my case, I make logical volumes available on all hosts with the same > > path > > > > on each host. So I can assemble a software raid1 where each device > > lives on a different server while not loosing the possibility of live > > migration. > > > > * Configure a domU with two block devices, the according (linux) stubdomu > > > > assembles a software raid1 (linux md device) and presents this md > > device to the domU. So the domU hasn''t to handle anything with the > > software raid1 but has ONE redundant block device for its usage. > > > > * I have two use cases, one is a HVM domU, where something like windows > > is > > > > running and because the lack of (good) software raid1 I use the > > software raid1 of linux for baking the block device inside the > > stubdom. The other use case is a PV domU where the admin of the > > virtualization environment is not the admin of the domU and therefore > > isn''t able to manage the software raid1 inside the domU. So the > > stubdom could be used to manage the software raid1 without interfering > > within the domU.-- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Tue, 2013-01-29 at 19:32 +0000, Markus Hochholdinger wrote:> Hello, > > Am 29.01.2013 um 17:36 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>: > > On Tue, 2013-01-29 at 15:46 +0000, Markus Hochholdinger wrote: > > > I''m reading a lot about xen stub-domains, but I''m wondering if I can use > > > a linux stubdom to serve a "transformed" block device for the according > > > domU. The wiki page(s) and the stubdom-directory in the source code > > > leaves a lot of questions to me I''m hoping someone can answer me here. > > I think the thing you are looking for is a "driver domain" rather than a > > stubdomain, http://wiki.xen.org/wiki/Driver_Domain. You''ll likely find > > more useful info if you google for that rather than stubdomain. > > a driver domain is a great thing, but I''m wondering if I can live migrate a > domU inclusive its driver domU? In the domU config I have to use the $domid of > the driver domain. This $domid is probably wrong after a live migration of the > domU and I have to migrate the driver domU in the same time while migrating > the domU. If the driver domU is ment to stay on one dom0 there''s no difference > between doing this in dom0, but for me this doesn''t work.The change of $domid doesn''t matter since a migration involves reconnecting the devices anyway, which means thy will reconnect to the "new" driver domain. The normal way would be to have a driver domain per host but there''s no reason you couldn''t make it such that the driver domain was migrated too (you''d have to do some hacking to make this work). AIUI you currently have a RAID1 device in the guest, presumably constructed from 2 xvd* devices? What are those two xvda devices are backed by? I presume it must be some sort of network storage (NFS, iSCSI, NBD, DRDB) or else you just couldn''t migrate. Are you intending to instead run the RAID1 device in a "driver domain", constructed from 2 xvd* devices exported from dom0 and exporting that as a single xvd* device to the guest? Or are you intending to surface the network storage directly into the driver domain, construct the RAID device from those and export that as an xvd* to the guest?> Do you know how I can live migrate a domU which depends on a driver domU? How > can I migrate the driver domU? > For my understanding the block device has to be there on the destination dom0 > before live migration begins but is also used on the source dom0 from the > migrating, but still running, domU.Not quite, when you migrate there is a pause period while the final copy over occurs and at this point you can safely remove the device from the source host and make it available on the target host. The toolstack will ensure that the block device is only ever active on one end of the other and never on both -- otherwise you would get potential corruption. While you could migrate the driver domain during the main domU''s pause period it is much more normal to simply have a driver domain on each host and dynamically configure the storage as you migrate.> Can I combine a driver domU to a normal domU like I can combine a stubdom with > a normal domU?If you want, it would be more typical to have a single driver domain providing block services to all domains (or one per underlying physical block device).> I thought a stubdom live migrates with its domU so you don''t have to worry > that the driver domU live migrates while the according normal domU migrates.A stubdom is a bit of an overloaded term. If you mean an ioemu stub domain (i.e. the qemu associated with an HVM guest) then a new one of those is started on the target host each time you migrate. If you mean a xenstored stubdom then those are per host and are not migrated. And if you mean a driver domain then as I say those are usually per host and the domain will be connected to the appropriate local driver domain on the target host.> > > So my question are: > > > * What are the requirements to run linux inside a stubdom? Is a current > > > pvops > > > kernel enough or has the linux kernel to be modified for a stubdom? If > > > yes, I would prepare a kernel and a minimal rootfs within an initrd to > > > setup my blockdevice for the domU. > > You can use Linux as a block driver storage domain, yes. > > OK, I see. But still, would it be possible to run linux in a stub-domain? I''ve > read e.g. http://blog.xen.org/index.php/2012/12/12/linux-stub-domain/ which > describes this, but I''m unsure if this will be (or is already) supported by > current xen?This work is about a Linux ioemu stub domain. That is a stubdomain with the sole purpose of running the qemu emulation process for an HVM domain. I think the intention is for this to land in Xen 4.3 but it does not have anything to do with your usecase AFAICT. Everything you want to do is already possible with what is in Xen and Linux today, in that the mechanisms all exist. However what you are doing is not something which others have done and so there will necessarily need to be a certain amount of putting the pieces together on your part.> > > * How can I offer a block device (created within the stubdom) from the > > > stubdom > > > > > > to the domU? Are there any docs how to configure this? > > > > > > * If the above is not possible, how could I offer a blockdevice from one > > > domU > > > > > > to another domU as block device? Are there any docs how to do this? > > Since a driver domain is also a domU (just one which happens to provide > > services to other domains) these are basically the same question. People > > more often do this with network driver domains but block ought to be > > possible too (although there may be a certain element of having to take > > the pieces and build something yourself). > > Is live migration possible with this driver domUs? What requirements are > needed that I can live migrate a domU which depends on a driver domU? > > > > Essentially you just need to a) make the block device bits available in > > the driver domain b) run blkback (or some other block backend) in the > > driver domain and c) tell the toolstack that a particular device is > > provided by the driver domain when you build the guest. > > Yes, I would provide two block devices (logical volumes) into the driver domU,How are you doing this? Where do those logical device come from and how are they getting into the driver domU?> create there a software raid1 device and make the md device available with > blkback. I would do this for each domU so I can live migrate domUs > independent. The driver domU only needs a kernel and a initrd with a rootfs > filled with enough to build the md device and export it with blkback. > > But how can I address this exported block device? As far as I''ve seen I need > the $domid of the driver domain in my config file of the domU, or am I missing > something?$domid can also be a domain name, and you can also change this over migration by providing an updated configuration file (at least with xl).> So this would be a dependency that the driver domain is started before the > stubdom with qemu.Yes.> In some subdom-startup script I saw the parameter "target" while creating the > stubdom. Is this a possibility to combine two domUs?I think "target" in this context refers to the HVM guest for which the ioemu-stubdom is providing services.> As far as I understand with a driver domU I need the $domid of the driver domU > in my config, so this is the connection between the two domUs. > But with stub-domains I haven''t understand how data flows between stubdom and > domU and because I''ve seen a lot of nice little pictures describing that I/O > flows between domU over stubdom to dom0 and backwards I thought stub-domains > would be the way to go.Only if you are using emulated I/O. I assumed you were using PV I/O, is that not the case? Ian.
Hello, Am 30.01.2013 um 10:36 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>:> On Tue, 2013-01-29 at 19:32 +0000, Markus Hochholdinger wrote: > > Am 29.01.2013 um 17:36 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>: > > > On Tue, 2013-01-29 at 15:46 +0000, Markus Hochholdinger wrote:[..]> The change of $domid doesn''t matter since a migration involves > reconnecting the devices anyway, which means thy will reconnect to the > "new" driver domain.OK, I understand that not a numeric id has to be equal but the name of the driver domU.> The normal way would be to have a driver domain per host but there''s no > reason you couldn''t make it such that the driver domain was migrated too > (you''d have to do some hacking to make this work).If my driver domU is on the same hardware host as the domU I don''t have to care about split brain situations for the storage. So my idea is to have one driver domU for each normal domU. And the live migration is possible difficult. As I understand I have to create somehow the driver domU on the destination to have the block device my normal domU is to be connected. I''ll look into this if I find no easier solution.> AIUI you currently have a RAID1 device in the guest, presumably > constructed from 2 xvd* devices? What are those two xvda devices are > backed by? I presume it must be some sort of network storage (NFS, > iSCSI, NBD, DRDB) or else you just couldn''t migrate.Well, perhaps some device path say more than my bad english: node1:/dev/xbd/mydomu.node1 -> /dev/vg0/mydomu (also exported over iscsi) node1:/dev/xbd/mydomu.node2 -> /dev/sdx (imported over iscsi) node2:/dev/xbd/mydomu.node1 -> /dev/sdy (imported over iscsi) node2:/dev/xbd/mydomu.node2 -> /dev/vg0/mydomu (also exported over iscsi) node3:/dev/xbd/mydomu.node1 -> /dev/sdy (imported over iscsi) node3:/dev/xbd/mydomu.node2 -> /dev/sdz (imported over iscsi) In /dev/xbd/* there are only symlinks to the according device so I have consistent path on all nodes. On all hardware nodes (node1, node2 and node3) I can access the logical volume /dev/vg0/mydomu on node1 with the path /dev/xbd/mydomu.node1 which I use in my domU configurations. If I''m not on node1 the block device is transported over iscsi. Because on all nodes /dev/xbd/mydomu.node1 points to the same block device I''m able to live migrate the domUs independently where the physical location of the logical volume is. So my xvda and xvdb inside the domU are baked by /dev/xbd/mydomu.node1 and /dev/xbd/mydomu.node2 and if one or both of these logical volumes are not local it uses iscsi (in dom0) for transport.> Are you intending to instead run the RAID1 device in a "driver domain", > constructed from 2 xvd* devices exported from dom0 and exporting that as > a single xvd* device to the guest?Yes, somehow. But the exported devices from dom0 don''t have to be local logical volumes of the dom0 but can be remote iscsi block devices. For me it is very important to be able to live migrate domUs but also have the storage at least redundant over two nodes.> Or are you intending to surface the > network storage directly into the driver domain, construct the RAID > device from those and export that as an xvd* to the guest?No.> > Do you know how I can live migrate a domU which depends on a driver domU? > > How can I migrate the driver domU? > > For my understanding the block device has to be there on the destination > > dom0 before live migration begins but is also used on the source dom0 > > from the migrating, but still running, domU. > Not quite, when you migrate there is a pause period while the final copy > over occurs and at this point you can safely remove the device from the > source host and make it available on the target host. The toolstack willIsn''t the domU on the destination created with all its virtual devices before the migration starts? What if the blkback is not ready on the destination host? Am I missing something?> ensure that the block device is only ever active on one end of the other > and never on both -- otherwise you would get potential corruption.Yeah, this is the problem! If I migrate the active raid1 logic within the domU (aka linux software raid1) I don''t have to care. I''ll try to accomplish the same with a "helper" domU very near to the normal domU and which is live migrated while the normal domU is migrated.> While you could migrate the driver domain during the main domU''s pause > period it is much more normal to simply have a driver domain on each > host and dynamically configure the storage as you migrate.If I dynamically create the software raid1 I have to add a lot of checks which I don''t need now. I''ve already thought about a software raid1 in the dom0 and the resulting md device as xvda for a domU. But I have to assemble the md device on the destination host before I can deactivate the md device on the source host. The race condition is, if I deactivate the md device on the source host while data is only written to one of the two devices. On the destination host my raid1 seems clean but my two devices differ. The other race condition is, if my raid1 is inconsistent while assembling on the destination host.> > Can I combine a driver domU to a normal domU like I can combine a stubdom > > with a normal domU? > If you want, it would be more typical to have a single driver domain > providing block services to all domains (or one per underlying physical > block device).I want :-) A single driver domain would need more logic (for me) while doing live migrations.> > I thought a stubdom live migrates with its domU so you don''t have to > > worry that the driver domU live migrates while the according normal domU > > migrates. > A stubdom is a bit of an overloaded term. If you mean an ioemu stub > domain (i.e. the qemu associated with an HVM guest) then a new one of > those is started on the target host each time you migrate.OK, this isn''t what I want. For me a (re)start on the destination host is no better than if I would do the software raid1 in dom0.> If you mean a xenstored stubdom then those are per host and are not > migrated.Hm, if they are not migrated they don''t behave like I expect.> And if you mean a driver domain then as I say those are usually per host > and the domain will be connected to the appropriate local driver domain > on the target host.OK, it seems I want a driver domain which I would migrate while migrating the according normal domU. [..]> > OK, I see. But still, would it be possible to run linux in a stub-domain? > > I''ve read e.g. > > http://blog.xen.org/index.php/2012/12/12/linux-stub-domain/ which > > describes this, but I''m unsure if this will be (or is already) supported > > by current xen? > This work is about a Linux ioemu stub domain. That is a stubdomain with > the sole purpose of running the qemu emulation process for an HVM > domain. I think the intention is for this to land in Xen 4.3 but it does > not have anything to do with your usecase AFAICT.OK, I see that. If the linux ioemu stub domu would be (re)started on the destination host on a live migration it doesn''t solve my problem.> Everything you want to do is already possible with what is in Xen and > Linux today, in that the mechanisms all exist. However what you are > doing is not something which others have done and so there will > necessarily need to be a certain amount of putting the pieces together > on your part.Yeah, this gives hope to me :-) [..]> > Yes, I would provide two block devices (logical volumes) into the driver > > domU, > How are you doing this? Where do those logical device come from and how > are they getting into the driver domU?See the explanation above for details, the logical volumes come from the local host and/or from remote hosts over isccsi with a consistent path on all hosts.> > create there a software raid1 device and make the md device available > > with blkback. I would do this for each domU so I can live migrate domUs > > independent. The driver domU only needs a kernel and a initrd with a > > rootfs filled with enough to build the md device and export it with > > blkback. > > But how can I address this exported block device? As far as I''ve seen I > > need the $domid of the driver domain in my config file of the domU, or > > am I missing something? > $domid can also be a domain name, and you can also change this over > migration by providing an updated configuration file (at least with xl).If $domid can be a name it really would be possible for me. Great!> > So this would be a dependency that the driver domain is started before > > the stubdom with qemu. > Yes. > > In some subdom-startup script I saw the parameter "target" while creating > > the stubdom. Is this a possibility to combine two domUs? > I think "target" in this context refers to the HVM guest for which the > ioemu-stubdom is providing services.Yes, I''ve also thought this way. So my idea was to create a driver domain with target to the according domU so I only "see" one domU. It could ease the management I''ve thought.> > As far as I understand with a driver domU I need the $domid of the driver > > domU in my config, so this is the connection between the two domUs. > > But with stub-domains I haven''t understand how data flows between stubdom > > and domU and because I''ve seen a lot of nice little pictures describing > > that I/O flows between domU over stubdom to dom0 and backwards I thought > > stub-domains would be the way to go. > Only if you are using emulated I/O. I assumed you were using PV I/O, is > that not the case?OK, my bad. I''ve two use cases: 1. Provide a redundant block device for a PV domU but I''m not able to manage the software raid1 inside the domU. 2. Provide a redundant block device for a HVM domU for operating systems which have no (good) software raid1 implementation. Many thanks so far, I''ll try the driver domain approach and test if it solves my problem and if it looses not too much performance. -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Wed, 2013-01-30 at 15:35 +0000, Markus Hochholdinger wrote:> > > Do you know how I can live migrate a domU which depends on a driver domU? > > > How can I migrate the driver domU? > > > For my understanding the block device has to be there on the destination > > > dom0 before live migration begins but is also used on the source dom0 > > > from the migrating, but still running, domU. > > Not quite, when you migrate there is a pause period while the final copy > > over occurs and at this point you can safely remove the device from the > > source host and make it available on the target host. The toolstack will > > Isn''t the domU on the destination created with all its virtual devices before > the migration starts?No> What if the blkback is not ready on the destination host?We have to arrange that it is.> Am I missing something?Migration is a staged process. 1. First an empty shell domain (with no devices) is created on the target host. 2. Then we copy the memory over, in several iterations, while the domain is running on the source host (iterations happen to handle the guest dirtying memory as we copy, this is the "live" aspect of the migration). 3. After some iterations of live migration we pause the source guest 4. Now we copy the remaining dirty RAM 5. Tear down devices on the source host 6. Setup devices on the target host for the incoming domain 7. Resume the guest on the target domain 8. Guest reconnects to new backend The key point is that the devices are only ever active on either the source or the target host and never both. The domain is paused during this final transfer (from #3 until #7) and therefore guest I/O is quiesced. In your scenario I would expect that in the interval of #5,#6 you would migrate the associated driver domain.> > > ensure that the block device is only ever active on one end of the other > > and never on both -- otherwise you would get potential corruption. > > Yeah, this is the problem! If I migrate the active raid1 logic within the domU > (aka linux software raid1) I don''t have to care. I''ll try to accomplish the > same with a "helper" domU very near to the normal domU and which is live > migrated while the normal domU is migrated.This might be possible but as I say the more normal approach would be to have a "RAID" domain on both hosts and dynamically map and unmap the backing guest disks at steps #5 and #6 above.> > While you could migrate the driver domain during the main domU''s pause > > period it is much more normal to simply have a driver domain on each > > host and dynamically configure the storage as you migrate. > > If I dynamically create the software raid1 I have to add a lot of checks which > I don''t need now. > I''ve already thought about a software raid1 in the dom0 and the resulting md > device as xvda for a domU. But I have to assemble the md device on the > destination host before I can deactivate the md device on the source host.No you don''t, you deactivate on the source (step #5) before activating on the target (step #6).> The > race condition is, if I deactivate the md device on the source host while data > is only written to one of the two devices. On the destination host my raid1 > seems clean but my two devices differ. The other race condition is, if my > raid1 is inconsistent while assembling on the destination host.I''d have thought that shutting down the raid in step #5 and reactivating in step #6 would guarantee that neither of these were possible.> > > Can I combine a driver domU to a normal domU like I can combine a stubdom > > > with a normal domU? > > If you want, it would be more typical to have a single driver domain > > providing block services to all domains (or one per underlying physical > > block device). > > I want :-) A single driver domain would need more logic (for me) while doing > live migrations.OK, but be aware that you are treading into unexplored territory, most people do things the other way. This means you are likely going to have to do a fair bit of heavy lifting yourself. Ian.
Hello, Am 31.01.2013 um 12:22 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>:> On Wed, 2013-01-30 at 15:35 +0000, Markus Hochholdinger wrote: > > > Not quite, when you migrate there is a pause period while the final > > > copy over occurs and at this point you can safely remove the device > > > from the source host and make it available on the target host. The > > > toolstack will > > Isn''t the domU on the destination created with all its virtual devices > > before the migration starts? > Nooh, this opens new possibilities for me :-)> > What if the blkback is not ready on the destination host? > We have to arrange that it is.OK, I see.> > Am I missing something? > Migration is a staged process. > 1. First an empty shell domain (with no devices) is created on the > target host. > 2. Then we copy the memory over, in several iterations, while the > domain is running on the source host (iterations happen to > handle the guest dirtying memory as we copy, this is the "live" > aspect of the migration). > 3. After some iterations of live migration we pause the source > guest > 4. Now we copy the remaining dirty RAM > 5. Tear down devices on the source host > 6. Setup devices on the target host for the incoming domain > 7. Resume the guest on the target domain > 8. Guest reconnects to new backend > The key point is that the devices are only ever active on either the > source or the target host and never both. The domain is paused during > this final transfer (from #3 until #7) and therefore guest I/O is > quiesced.At what point are scripts like disk = [ ".., script=myblockscript.sh" ] executed? Would this be between #3 and #7?> In your scenario I would expect that in the interval of #5,#6 you would > migrate the associated driver domain.OK.> > > ensure that the block device is only ever active on one end of the > > > other and never on both -- otherwise you would get potential > > > corruption. > > Yeah, this is the problem! If I migrate the active raid1 logic within the > > domU (aka linux software raid1) I don''t have to care. I''ll try to > > accomplish the same with a "helper" domU very near to the normal domU > > and which is live migrated while the normal domU is migrated. > This might be possible but as I say the more normal approach would be to > have a "RAID" domain on both hosts and dynamically map and unmap the > backing guest disks at steps #5 and #6 above.With the above info, that block devices are removed and added in the right order while doing live migration, I''m thinking more and more about a driver domain. But in the first place I''ll test the stopping and assembling of md devices in the dom0s while migrating. If this works I could put this job into a driver domain. Wow, this gives me a new view of the setup.> > > While you could migrate the driver domain during the main domU''s pause > > > period it is much more normal to simply have a driver domain on each > > > host and dynamically configure the storage as you migrate. > > If I dynamically create the software raid1 I have to add a lot of checks > > which I don''t need now. > > I''ve already thought about a software raid1 in the dom0 and the resulting > > md device as xvda for a domU. But I have to assemble the md device on > > the destination host before I can deactivate the md device on the source > > host. > No you don''t, you deactivate on the source (step #5) before activating > on the target (step #6).This wasn''t clear to me before. Many thanks for this info.> > The > > race condition is, if I deactivate the md device on the source host while > > data is only written to one of the two devices. On the destination host > > my raid1 seems clean but my two devices differ. The other race condition > > is, if my raid1 is inconsistent while assembling on the destination > > host. > I''d have thought that shutting down the raid in step #5 and reactivating > in step #6 would guarantee that neither of these were possible.I''ll try this first of all. If this works I''ll recheck the performance against drbd and probably try a driver domain with this.> > > > Can I combine a driver domU to a normal domU like I can combine a > > > > stubdom with a normal domU? > > > If you want, it would be more typical to have a single driver domain > > > providing block services to all domains (or one per underlying physical > > > block device). > > I want :-) A single driver domain would need more logic (for me) while > > doing live migrations. > OK, but be aware that you are treading into unexplored territory, most > people do things the other way. This means you are likely going to have > to do a fair bit of heavy lifting yourself.If this solves my problem I''m willing to go into unexplored territory :-) But I''m also sane enough to test the common ways first. Many thanks so far. -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
> > > Am I missing something? > > Migration is a staged process. > > 1. First an empty shell domain (with no devices) is created on the > > target host. > > 2. Then we copy the memory over, in several iterations, while the > > domain is running on the source host (iterations happen to > > handle the guest dirtying memory as we copy, this is the "live" > > aspect of the migration). > > 3. After some iterations of live migration we pause the source > > guest > > 4. Now we copy the remaining dirty RAM > > 5. Tear down devices on the source host > > 6. Setup devices on the target host for the incoming domain > > 7. Resume the guest on the target domain > > 8. Guest reconnects to new backend > > The key point is that the devices are only ever active on either the > > source or the target host and never both. The domain is paused during > > this final transfer (from #3 until #7) and therefore guest I/O is > > quiesced. > > At what point are scripts like > disk = [ ".., script=myblockscript.sh" ] > executed? Would this be between #3 and #7?It is part of the device teardown and setup, so it is during #5 and #6 (strictly I think it is just after #5 and just before #6). On xen-devel at the minute there is a patch series under discussion to make the script hooks more flexible, in particular adding pre and post migrate hooks (called something like #1-#3 and #7-#7) which can pre setup bits of the storage stack which are safe to do with the guest running but might be slow to initialise (e.g. iSCSI login, but not opening the device). I don''t think this needs to affect you though.> > > > ensure that the block device is only ever active on one end of the > > > > other and never on both -- otherwise you would get potential > > > > corruption. > > > Yeah, this is the problem! If I migrate the active raid1 logic within the > > > domU (aka linux software raid1) I don''t have to care. I''ll try to > > > accomplish the same with a "helper" domU very near to the normal domU > > > and which is live migrated while the normal domU is migrated. > > This might be possible but as I say the more normal approach would be to > > have a "RAID" domain on both hosts and dynamically map and unmap the > > backing guest disks at steps #5 and #6 above. > > With the above info, that block devices are removed and added in the right > order while doing live migration, I''m thinking more and more about a driver > domain. > > But in the first place I''ll test the stopping and assembling of md devices in > the dom0s while migrating. If this works I could put this job into a driver > domain. Wow, this gives me a new view of the setup.Excellent ;-) Ian.
Hello, Am 01.02.2013 um 09:56 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>:> > > > Am I missing something? > > > Migration is a staged process. > > > 1. First an empty shell domain (with no devices) is created on the > > > target host. > > > 2. Then we copy the memory over, in several iterations, while the > > > domain is running on the source host (iterations happen to > > > handle the guest dirtying memory as we copy, this is the "live" > > > aspect of the migration). > > > 3. After some iterations of live migration we pause the source > > > guest > > > 4. Now we copy the remaining dirty RAM > > > 5. Tear down devices on the source host > > > 6. Setup devices on the target host for the incoming domain > > > 7. Resume the guest on the target domain > > > 8. Guest reconnects to new backend > > > The key point is that the devices are only ever active on either the > > > source or the target host and never both. The domain is paused during > > > this final transfer (from #3 until #7) and therefore guest I/O is > > > quiesced. > > At what point are scripts like > > disk = [ ".., script=myblockscript.sh" ] > > executed? Would this be between #3 and #7? > It is part of the device teardown and setup, so it is during #5 and #6 > (strictly I think it is just after #5 and just before #6). > On xen-devel at the minute there is a patch series under discussion to > make the script hooks more flexible, in particular adding pre and post > migrate hooks (called something like #1-#3 and #7-#7) which can pre > setup bits of the storage stack which are safe to do with the guest > running but might be slow to initialise (e.g. iSCSI login, but not > opening the device). I don''t think this needs to affect you though.at least with xm toolstack, block-scripts in /etc/xen/scripts/block-* are executed on the destination host (add) before the script (remove) on the source host is executed while live migrating a domU. (I''ve created the script /etc/xen/scripts/block-md which assembles and stops raid1 devices.) Next, I''ll test xl toolstack with this setup. [..] -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hello, Am 06.02.2013 um 15:39 Uhr schrieb Markus Hochholdinger <Markus@hochholdinger.net>:> Am 01.02.2013 um 09:56 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>:[..]> > > executed? Would this be between #3 and #7? > > It is part of the device teardown and setup, so it is during #5 and #6 > > (strictly I think it is just after #5 and just before #6). > > On xen-devel at the minute there is a patch series under discussion to > > make the script hooks more flexible, in particular adding pre and post > > migrate hooks (called something like #1-#3 and #7-#7) which can pre > > setup bits of the storage stack which are safe to do with the guest > > running but might be slow to initialise (e.g. iSCSI login, but not > > opening the device). I don''t think this needs to affect you though. > at least with xm toolstack, block-scripts in /etc/xen/scripts/block-* are > executed on the destination host (add) before the script (remove) on the > source host is executed while live migrating a domU. > (I''ve created the script /etc/xen/scripts/block-md which assembles and > stops raid1 devices.) > Next, I''ll test xl toolstack with this setup.with Xen 4.1 there''s no support for custom scripts within libxl. With Xen 4.2.1 there''s support for custom scripts (like with xm toolstack) but also only add/remove. And add is called on the destination side before remove is called on the transmitting side while doing a live migration of a domU. Next I''ll test latest development version... -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Mon, 2013-02-11 at 14:59 +0000, Markus Hochholdinger wrote:> Hello, > > Am 06.02.2013 um 15:39 Uhr schrieb Markus Hochholdinger > <Markus@hochholdinger.net>: > > Am 01.02.2013 um 09:56 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>: > [..] > > > > executed? Would this be between #3 and #7? > > > It is part of the device teardown and setup, so it is during #5 and #6 > > > (strictly I think it is just after #5 and just before #6). > > > On xen-devel at the minute there is a patch series under discussion to > > > make the script hooks more flexible, in particular adding pre and post > > > migrate hooks (called something like #1-#3 and #7-#7) which can pre > > > setup bits of the storage stack which are safe to do with the guest > > > running but might be slow to initialise (e.g. iSCSI login, but not > > > opening the device). I don''t think this needs to affect you though. > > at least with xm toolstack, block-scripts in /etc/xen/scripts/block-* are > > executed on the destination host (add) before the script (remove) on the > > source host is executed while live migrating a domU. > > (I''ve created the script /etc/xen/scripts/block-md which assembles and > > stops raid1 devices.) > > Next, I''ll test xl toolstack with this setup. > > with Xen 4.1 there''s no support for custom scripts within libxl. > > With Xen 4.2.1 there''s support for custom scripts (like with xm toolstack) but > also only add/remove. And add is called on the destination side before remove > is called on the transmitting side while doing a live migration of a domU.This sounds like a bug which ought to be addressed (Roger, can you take a look?)> Next I''ll test latest development version...I''m not sure it will differ from 4.2.x in this area (yet). Roger can probably advise better than me though. Ian.
On 11/02/13 16:05, Ian Campbell wrote:> On Mon, 2013-02-11 at 14:59 +0000, Markus Hochholdinger wrote: >> Hello, >> >> Am 06.02.2013 um 15:39 Uhr schrieb Markus Hochholdinger >> <Markus@hochholdinger.net>: >>> Am 01.02.2013 um 09:56 Uhr schrieb Ian Campbell <Ian.Campbell@citrix.com>: >> [..] >>>>> executed? Would this be between #3 and #7? >>>> It is part of the device teardown and setup, so it is during #5 and #6 >>>> (strictly I think it is just after #5 and just before #6). >>>> On xen-devel at the minute there is a patch series under discussion to >>>> make the script hooks more flexible, in particular adding pre and post >>>> migrate hooks (called something like #1-#3 and #7-#7) which can pre >>>> setup bits of the storage stack which are safe to do with the guest >>>> running but might be slow to initialise (e.g. iSCSI login, but not >>>> opening the device). I don''t think this needs to affect you though. >>> at least with xm toolstack, block-scripts in /etc/xen/scripts/block-* are >>> executed on the destination host (add) before the script (remove) on the >>> source host is executed while live migrating a domU. >>> (I''ve created the script /etc/xen/scripts/block-md which assembles and >>> stops raid1 devices.) >>> Next, I''ll test xl toolstack with this setup. >> >> with Xen 4.1 there''s no support for custom scripts within libxl. >> >> With Xen 4.2.1 there''s support for custom scripts (like with xm toolstack) but >> also only add/remove. And add is called on the destination side before remove >> is called on the transmitting side while doing a live migration of a domU.Yes, I''ve also realized this while working on the new hotplug implementation. The hotplug script is executed on the destination before the other end has executed the remove script (this is due to the fact that the remove script is executed when the migrated domain is destroyed on the source). So at a certain point the destination host has executed the "add" script before the source host executes the "remove" hotplug script. This is not a problem with the current hotplug scripts in-three, because we can guarantee that the device will not be accessed simultaneously (because the guest only resumes on either the source or the destination hosts, but never on both). So the scheme looks more like: 1. First an empty shell domain (with no devices) is created on the target host. 2. Then we copy the memory over, in several iterations, while the domain is running on the source host (iterations happen to handle the guest dirtying memory as we copy, this is the "live" aspect of the migration). 3. After some iterations of live migration we pause the source guest 4. Setup devices on the target host for the incoming domain 5. Now we copy the remaining dirty RAM 6. Resume the guest on the target domain 7. Tear down devices on the source host 8. Guest reconnects to new backend (#7 and #8 can happen in different order) #4 will be where the hotplug script "add" call happens on the target host, and #7 where the hotplug script "remove" call happens on the source host.> > This sounds like a bug which ought to be addressed (Roger, can you take > a look?)I think this is how migration works in both xl and xm, but if there are hotplug scripts that cannot be executed simultaneously (ie you cannot make two simultaneous calls to "add" without calling "remove" first) we could mark it as a bug. It would make the resume on source host more complicated, since in case of failure we will have to remove the devices on the destination host and reconnect them on the source host.>> Next I''ll test latest development version... > > I''m not sure it will differ from 4.2.x in this area (yet). Roger can > probably advise better than me though.No, this has not changed in -unstable.
Hello, Am 11.02.2013 um 17:00 Uhr schrieb Roger Pau Monné <roger.pau@citrix.com>:> On 11/02/13 16:05, Ian Campbell wrote: > > On Mon, 2013-02-11 at 14:59 +0000, Markus Hochholdinger wrote: > >> Am 06.02.2013 um 15:39 Uhr schrieb Markus Hochholdinger > >>> Am 01.02.2013 um 09:56 Uhr schrieb Ian Campbell[..]> >> With Xen 4.2.1 there''s support for custom scripts (like with xm > >> toolstack) but also only add/remove. And add is called on the > >> destination side before remove is called on the transmitting side while > >> doing a live migration of a domU. > Yes, I''ve also realized this while working on the new hotplug > implementation. The hotplug script is executed on the destination before > the other end has executed the remove script (this is due to the fact > that the remove script is executed when the migrated domain is destroyed > on the source). So at a certain point the destination host has executed > the "add" script before the source host executes the "remove" hotplug > script.OK, so this is what I thought before. Many thanks for clarification.> This is not a problem with the current hotplug scripts in-three, because > we can guarantee that the device will not be accessed simultaneously > (because the guest only resumes on either the source or the destination > hosts, but never on both). > So the scheme looks more like: > 1. First an empty shell domain (with no devices) is created on the > target host. > 2. Then we copy the memory over, in several iterations, while the > domain is running on the source host (iterations happen to > handle the guest dirtying memory as we copy, this is the "live" > aspect of the migration). > 3. After some iterations of live migration we pause the source > guest > 4. Setup devices on the target host for the incoming domain > 5. Now we copy the remaining dirty RAM > 6. Resume the guest on the target domain > 7. Tear down devices on the source host > 8. Guest reconnects to new backend > (#7 and #8 can happen in different order) > #4 will be where the hotplug script "add" call happens on the target > host, and #7 where the hotplug script "remove" call happens on the > source host.As far as I understand now the (block) device on the destination host will be read before the block device on the source is detached.> > This sounds like a bug which ought to be addressed (Roger, can you take > > a look?) > I think this is how migration works in both xl and xm, but if there are > hotplug scripts that cannot be executed simultaneously (ie you cannot > make two simultaneous calls to "add" without calling "remove" first) we > could mark it as a bug.No, it was not a bug of the hotplug scripts, I made a hotplug script myself to assemble linux raid1 devices and log timestamps of execution.> It would make the resume on source host more complicated, since in case > of failure we will have to remove the devices on the destination host > and reconnect them on the source host.I understand.> >> Next I''ll test latest development version... > > I''m not sure it will differ from 4.2.x in this area (yet). Roger can > > probably advise better than me though. > No, this has not changed in -unstable.OK. Many thanks. -- greetings eMHa _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users