Ian Pratt
2006-Jun-20 11:07 UTC
RE: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
> AW> This should be fixable though. I''m also not sure how carefully > AW> dm-u watches block completion responses to ensure safety of > AW> metadata updates relative to data writes. This too should be > AW> fixable -- i just don''t know if the user-level tools can currently > AW> request completion notifications on requests that they''ve > AW> processed. > > So, right now, we''re a little optimistic about metadata writing. It > will be relatively easy to hijack the callback routine for the disk > request (a technique which is heavily used in the rest of the block > layer) to get a completion trigger. We can then notify userspace for > the metadata write and then trigger the original callback routine for > completion.Yep, dm-userspace is certainly going to need to have a way of intercepting IO completions and then choosing when it''s actually going to propagate the completion to the backend. That''s quite a big change to the current code (incidentally, the dm-snap code is pretty shocking in this respect too).> AW> A benefit to the dm-user patch is that it is more of a linux > AW> approach than a xen+linux approach. Dm-user will be generally > AW> useful in the linux tree > > Right, this is a huge advantage, I think. Being able to mount images > as if they were disks will be quite helpful. Another benefit is the > ability to easily convert between formats. Converting a vmdk to a > qcow is as easy as mounting both and doing a "cp -R" between them.I think the blktap code should definitely export a kernel device at the top so that the same property holds. Should be easy to add.> AW> which has some bad failure characteristics which can result in > AW> both data being acknowledged as written even though it hasn''t > AW> been, and the OOM killer going insane. I think some fixes to loop > AW> probably need to be applied in the near future given how much > AW> people are generally depending on the code with VMs. > > Can you elaborate about what specifically is wrong with the loop > driver?It doesn''t bypass the buffer cache (so all bets are off for data integrity) and can end up consuming all of dom0 memory with dirty buffers -- just create a few loop devices and do a few parallel dd''s to them and watch the oomkiller go on the rampage. It''s even worse if the filesystem the file lives on is slow e.g. NFS.> AW> Julian and I have talked about extending the tap driver to combine > AW> it with blkback and allow block address translation without access > AW> to request contents. > > Since the kernel already has a block address translation solution > (i.e. device-mapper), is there a benefit to adding another > xen-specific one?I think blktap and dm-userspace are quite complementary, so I don''t see a problem with having them both in the tree. Right now, blktap looks to be the more mature solution, but dm-userspace could catch up. Blktap will obviously still be preferable when its necessary to actually touch the data. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Smith
2006-Jun-20 21:10 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
IP> Yep, dm-userspace is certainly going to need to have a way of IP> intercepting IO completions and then choosing when it''s actually IP> going to propagate the completion to the backend. That''s quite a IP> big change to the current code (incidentally, the dm-snap code is IP> pretty shocking in this respect too). I''m not sure if I agree that it will be a big change. It''s going to require keeping track of a few additional states for each remap, as well as a couple more message types. Hijacking the callback function of each request is done quite a bit in the rest of the block subsystem. My testing shows that communication between kernel and userspace for the additional handshaking will not add significant additional overhead. Definitely some work, but not a huge change, IMHO. IP> It doesn''t bypass the buffer cache (so all bets are off for data IP> integrity) and can end up consuming all of dom0 memory with dirty IP> buffers -- just create a few loop devices and do a few parallel IP> dd''s to them and watch the oomkiller go on the rampage. It''s even IP> worse if the filesystem the file lives on is slow e.g. NFS. Ok, it seems like this should be addressed in the upstream loop driver. I imagine quite a few people are depending on the loop driver right now, expecting it to maintain data integrity. Could the loop driver make use of the routines that do direct IO instead of the normal routines to solve this when it''s an issue? This brings me to another question: Will people really be using file-based images for their VMs? It seems to me that the performance of using a block device overshadows the convenience of a file image. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jun-21 14:45 UTC
[Xen-devel] Re: [PATCH] Blktap: Userspace file-based image support.(RFC)
On Tue, 20 Jun 2006 14:10:30 -0700, Dan Smith wrote:> IP> It doesn''t bypass the buffer cache (so all bets are off for data > IP> integrity) and can end up consuming all of dom0 memory with dirty > IP> buffers -- just create a few loop devices and do a few parallel > IP> dd''s to them and watch the oomkiller go on the rampage. It''s even > IP> worse if the filesystem the file lives on is slow e.g. NFS. > > Ok, it seems like this should be addressed in the upstream loop > driver. I imagine quite a few people are depending on the loop driver > right now, expecting it to maintain data integrity.It''s probably worth spending some cycles trying to improve the loop driver itself.> Could the loop driver make use of the routines that do direct IO > instead of the normal routines to solve this when it''s an issue?It appears that the loop driver is split between two threads using a producer/consumer queue. The main thread gets the bio requests and queues them for the consumer thread. The consumer thread can do a number of things depending on properties of the fd. It may use address ops, use fops->write, or do a transform of the data. It should be possible to, if the fd is opened with O_DIRECT and fops has a valid aio_{read,write}, use proper aio calls to queue the requests. You''ll probably have to get clever about how the thread blocks (has to wake up either on the queue mutex or when an aio request completes). I suspect that this will have a pretty noticable performance improvement in the loop driver (especially on SCSI/SATA storage). The loop driver still has issues though. It cannot grow and it has a pretty odd hardcoded limit (256 devices) which quickly becomes a scalability issue. The former problem could possibly be address by having a parameter for SET_STATUS that let''s you set the size of the device to be greater than the size of the underlying file. If a bio comes for an offset greater than the underlying file, it would have to be smart enough to ftruncate the file. The error handling is a bit tough (you''ll have to make sure that if ftruncate fails, you fail the read/write--extra points if the failure is temporary such that later on if space is freed up you succeed). The hardcoded limit is a bit larger of a problem. The driver would likely need a bit of reworking. Since 256 is the limit based on minor number allocation, you would have to either get some more device number space for it or just have the ability to allocate dynamic numbers and rely on udev/hotplug for folks that want more than 256.> This brings me to another question: Will people really be using > file-based images for their VMs? It seems to me that the performance > of using a block device overshadows the convenience of a file image.If the performance of the loop driver could be better (and fundamentally, there''s no reason it can''t be pretty good), then I see no reason why using file images wouldn''t be the most common approach. Files are quite a lot easier to manage than partitions. Of course, I see no reason why someone couldn''t write a FUSE front-end to LVM :-) Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2006-Jun-30 13:41 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
Hi, On Tue, 2006-06-20 at 14:10 -0700, Dan Smith wrote:> This brings me to another question: Will people really be using > file-based images for their VMs? It seems to me that the performance > of using a block device overshadows the convenience of a file image.It depends on the environment. To support cold/live migration, having network-attached storage will be required; and file images on NFS would be an extremely simple-to-setup way to achieve that. Personally I use LVM block devices almost exclusively when doing single- node testing, but NFS files are the easiest way I''ve got to share those images. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Smith
2006-Jun-30 14:17 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
SCT> It depends on the environment. To support cold/live migration, SCT> having network-attached storage will be required; and file images SCT> on NFS would be an extremely simple-to-setup way to achieve that. Ah, but block devices can play too. With dm-userspace, we could migrate a domain from one machine to another, faulting the needed blocks from its block devices on-demand, and copying the rest in the background. This would give us a peer-to-peer setup where block devices could slowly move from machine to machine, following its owner. Once your block was accessed (or copied in the background), it''s local and fast. A peer-to-peer NAS setup. What do you think? -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2006-Jun-30 19:37 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
Hi, On Fri, 2006-06-30 at 07:17 -0700, Dan Smith wrote:> SCT> It depends on the environment. To support cold/live migration, > SCT> having network-attached storage will be required; and file images > SCT> on NFS would be an extremely simple-to-setup way to achieve that. > > Ah, but block devices can play too. With dm-userspace, we could > migrate a domain from one machine to another, faulting the needed > blocks from its block devices on-demand, and copying the rest in the > background. This would give us a peer-to-peer setup where block > devices could slowly move from machine to machine, following its > owner. Once your block was accessed (or copied in the background), > it''s local and fast. A peer-to-peer NAS setup.Could be useful in places, but it introduces a number of new dependencies. The destination host now relies on the source host for data, so if the source crashes, you crash the destination too; and if you power-cycle, how do you track where in your cluster the latest copy of the block device is? A true NAS solution isolates the Xen hosts from these problems. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Smith
2006-Jun-30 20:06 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
ST> Could be useful in places, but it introduces a number of new ST> dependencies. I was mostly commenting about making migrating block devices as easy as (or easier) than file-backed domains, especially from a migration point of view. Being able to use local LVMs but still migrate easily without a NAS would be cool, I think, where appropriate. ST> The destination host now relies on the source host for data, so if ST> the source crashes, you crash the destination too; Sure, which a NAS solves, assuming the NAS is stable. ST> and if you power-cycle, how do you track where in your cluster the ST> latest copy of the block device is? I think that keeping metadata on that and invalidating blocks when you pull them off the source host could be done without too much trouble. Plus, I''m not talking about multiple-writers, so I think you could ignore a lot of the normal locking issues. ST> A true NAS solution isolates the Xen hosts from these problems. Absolutely. So what''s the benefit of having image files on NFS (as you mentioned) if you can use nbd or iSCSI? -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jerone Young
2006-Jun-30 22:15 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
On Fri, 2006-06-30 at 13:06 -0700, Dan Smith wrote:> ST> Could be useful in places, but it introduces a number of new > ST> dependencies. > > I was mostly commenting about making migrating block devices as easy > as (or easier) than file-backed domains, especially from a migration > point of view. Being able to use local LVMs but still migrate easily > without a NAS would be cool, I think, where appropriate.I would ask how exactly do you propose to do this ? Today at least file-backed domains seems to be the only real world way of doing migrations. Migrating block devices seems a little hairy (what if the other machine is already using sda for example), and may not be all the practical to do.> > ST> The destination host now relies on the source host for data, so if > ST> the source crashes, you crash the destination too; > > Sure, which a NAS solves, assuming the NAS is stable. > > ST> and if you power-cycle, how do you track where in your cluster the > ST> latest copy of the block device is? > > I think that keeping metadata on that and invalidating blocks when you > pull them off the source host could be done without too much trouble. > Plus, I''m not talking about multiple-writers, so I think you could > ignore a lot of the normal locking issues. > > ST> A true NAS solution isolates the Xen hosts from these problems. > > Absolutely. So what''s the benefit of having image files on NFS (as > you mentioned) if you can use nbd or iSCSI? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jul-01 00:36 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
> I would ask how exactly do you propose to do this ? Today at least > file-backed domains seems to be the only real world way of doing > migrations. Migrating block devices seems a little hairy (what if the > other machine is already using sda for example), and may not be all the > practical to do.Well, it doesn''t really matter what the destination dom0 is using as block devices provided the node name doesn''t have to stay the same on the destination machine - and if you use the hotplug scripts to set up block devices then it doesn;t. I think being able to demand-fault virtual disks across would be quite cool (with the copy eventually completing in the background, eliminating the origin as a point of failure. For smaller, or more ad-hoc setups this could be quite useful (especially if you had a daemon trickle updates across the network continuously at low bandwidth to minimise the diffs during migration) Cheers, Mark> > > ST> The destination host now relies on the source host for data, so if > > ST> the source crashes, you crash the destination too; > > > > Sure, which a NAS solves, assuming the NAS is stable. > > > > ST> and if you power-cycle, how do you track where in your cluster the > > ST> latest copy of the block device is? > > > > I think that keeping metadata on that and invalidating blocks when you > > pull them off the source host could be done without too much trouble. > > Plus, I''m not talking about multiple-writers, so I think you could > > ignore a lot of the normal locking issues. > > > > ST> A true NAS solution isolates the Xen hosts from these problems. > > > > Absolutely. So what''s the benefit of having image files on NFS (as > > you mentioned) if you can use nbd or iSCSI? > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Smith
2006-Jul-01 14:22 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
MW> Well, it doesn''t really matter what the destination dom0 is using MW> as block devices provided the node name doesn''t have to stay the MW> same on the destination machine - and if you use the hotplug MW> scripts to set up block devices then it doesn;t. Right, exactly. MW> I think being able to demand-fault virtual disks across would be MW> quite cool (with the copy eventually completing in the background, MW> eliminating the origin as a point of failure. For smaller, or MW> more ad-hoc setups this could be quite useful (especially if you MW> had a daemon trickle updates across the network continuously at MW> low bandwidth to minimise the diffs during migration) This is the exact situation I had in mind. I think it would be extremely cool to have a peer-to-peer block migration mechanism, which would allow the convenience of files for migration and the speed of block devices. You could even have a method for migrating block images between machines, independent of a migration. Imagine something like: lvmcp /dev/vols/foo othermachine:/dev/vols I think that would be neat. It''s rather straightforward too, I think. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jul-03 11:00 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
> This is the exact situation I had in mind. I think it would be > extremely cool to have a peer-to-peer block migration mechanism, which > would allow the convenience of files for migration and the speed of > block devices. You could even have a method for migrating block > images between machines, independent of a migration. Imagine > something like: > > lvmcp /dev/vols/foo othermachine:/dev/volsYes, that could even be nice and generic to other use cases, which is always a good sign and a good way of getting extra developers.> I think that would be neat. It''s rather straightforward too, I > think.Another thing I''ve always fancied is the ability to keep a virtual machine''s memory and disk images on two machines in close-sync by continuously trickling diffs... This would be used in cases (e.g. desktop migration to a mobile device, emergency server relocation) where you do have warning that a migration is required but you want really low latency (e.g. before your UPS runs out, so you can pick up your laptop and run to a meeting, etc, etc). Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jul-03 12:02 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
On Fri, 2006-06-30 at 20:37 +0100, Stephen C. Tweedie wrote:> Hi, > > On Fri, 2006-06-30 at 07:17 -0700, Dan Smith wrote: > > SCT> It depends on the environment. To support cold/live migration, > > SCT> having network-attached storage will be required; and file images > > SCT> on NFS would be an extremely simple-to-setup way to achieve that. > > > > Ah, but block devices can play too. With dm-userspace, we could > > migrate a domain from one machine to another, faulting the needed > > blocks from its block devices on-demand, and copying the rest in the > > background. This would give us a peer-to-peer setup where block > > devices could slowly move from machine to machine, following its > > owner. Once your block was accessed (or copied in the background), > > it''s local and fast. A peer-to-peer NAS setup. > > Could be useful in places, but it introduces a number of new > dependencies. The destination host now relies on the source host for > data, so if the source crashes, you crash the destination too; and if > you power-cycle, how do you track where in your cluster the latest copy > of the block device is?It''s easy. You run code to coordinate the mapping inside a fault-tolerant virtual machine which persists across node failures and cluster power cycles.> > A true NAS solution isolates the Xen hosts from these problems. > > --Stephen > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2006-Jul-03 14:52 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
Hi, On Fri, 2006-06-30 at 17:15 -0500, Jerone Young wrote:> I would ask how exactly do you propose to do this ? Today at least > file-backed domains seems to be the only real world way of doing > migrations. Migrating block devices seems a little hairy (what if the > other machine is already using sda for example), and may not be all the > practical to do.The practicality of it is certainly a concern; but for businesses with SANs already deployed that''s less of an issue. The issue of multiple users exists for files just as much as for devices, though. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2006-Jul-03 14:56 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
Hi, On Mon, 2006-07-03 at 13:02 +0100, Harry Butterworth wrote:> > Could be useful in places, but it introduces a number of new > > dependencies. The destination host now relies on the source host for > > data, so if the source crashes, you crash the destination too; and if > > you power-cycle, how do you track where in your cluster the latest copy > > of the block device is? > > It''s easy. You run code to coordinate the mapping inside a > fault-tolerant virtual machine which persists across node failures and > cluster power cycles.Right, you just made the point I was making --- you''ve introduced dependency on a new hypothetical fault-tolerant, cluster-aware device layer. :-) In principle, with the right software, and configuring your entire infrastructure from scratch, this sort of device-based mechanism may work very well. But today, with my existing storage already set up, the only way I can easily add Xen migration capabilities to my network, taking advantage of the existing storage server I have, is to use NFS from that server. I just don''t have any block-level SAN configured. *That* is why NFS is important --- not because it''s necessarily the better choice, but that it''s one of the configurations we can expect users to have already. Conversely, for users with SANs already, whether running over iSCSI or FC or whatever, block-level migration will be needed. It''s a matter of being able to use existing solutions rather than mandating a new storage configuration. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jul-03 15:40 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
On Mon, 2006-07-03 at 15:56 +0100, Stephen C. Tweedie wrote:> Hi, > > On Mon, 2006-07-03 at 13:02 +0100, Harry Butterworth wrote: > > > > Could be useful in places, but it introduces a number of new > > > dependencies. The destination host now relies on the source host for > > > data, so if the source crashes, you crash the destination too; and if > > > you power-cycle, how do you track where in your cluster the latest copy > > > of the block device is? > > > > It''s easy. You run code to coordinate the mapping inside a > > fault-tolerant virtual machine which persists across node failures and > > cluster power cycles. > > Right, you just made the point I was making --- you''ve introduced > dependency on a new hypothetical fault-tolerant, cluster-aware device > layer. :-)Yes, well I said we were going to need one of these about a year and a half ago. We should really have had it finished by now ;-P> > In principle, with the right software, and configuring your entire > infrastructure from scratch, this sort of device-based mechanism may > work very well.Yes. It does. Here''s one we prepared earlier: http://www-03.ibm.com/press/us/en/pressrelease/19705.wss> But today, with my existing storage already set up, the only way I can > easily add Xen migration capabilities to my network, taking advantage of > the existing storage server I have, is to use NFS from that server. I > just don''t have any block-level SAN configured. *That* is why NFS is > important --- not because it''s necessarily the better choice, but that > it''s one of the configurations we can expect users to have already. > > Conversely, for users with SANs already, whether running over iSCSI or > FC or whatever, block-level migration will be needed. It''s a matter of > being able to use existing solutions rather than mandating a new storage > configuration.I agree that it''s generally most important to have solutions that work now. I''m just taking an opportunity to get people thinking about how to solve the kind of problems exemplified by the block device migration above; of which there are quite a few other examples in Xen.> > --Stephen > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Warfield
2006-Jul-04 19:39 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
(Reordering quotes from these last two replies:)>From Stephen: > > Conversely, for users with SANs already, whether running over iSCSI or > > FC or whatever, block-level migration will be needed. It''s a matter of > > being able to use existing solutions rather than mandating a new storage > > configuration.If you have location transparency between the VM and the storage then NFS and SANs should both work just wine without block migration. Aside from some minor reconfig in dom0 as part of the movement, I don''t see why you think it''s going to be needed here -- I''ve done this with both GNBD and iSCSI just fine. At least insofar as I''m reading "block-level" migration to mean "copying the blocks over to the new physical host" -- this is how I took dan to mean this initially. Now, in situations where the disk is fate-sharing with the CPU that the VM is running on (e.g. you are using a local disk and want to migrate VMs to turn the physical machine off for service), then it seems like some form of block migration is obviously required. Something along the lines of DRDB would seem to do a good job of mirroring the disk to a second location in advance of migrating. I don''t think that I see the immediate benefit of the lazy (migrate and fault blocks across on demand) block migration. It doubles your exposure to failure (at least) and adds overhead. The only possible example I can think of is to very temporarily offload a VM that''s gone heavily CPU bound onto an unloaded host. Is there a more obviously useful situation that I''m missing?> > In principle, with the right software, and configuring your entire > > infrastructure from scratch, this sort of device-based mechanism may > > work very well. > > Yes. It does. Here''s one we prepared earlier: > http://www-03.ibm.com/press/us/en/pressrelease/19705.wssI rather doubt that anyone who happens to have purchased SVC as an image store is terribly concerned about the ability to lazily copy VM images from one local disk to another. a. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Smith
2006-Jul-05 00:25 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
AW> I don''t think that I see the immediate benefit of the lazy AW> (migrate and fault blocks across on demand) block migration. It AW> doubles your exposure to failure (at least) and adds overhead. AW> The only possible example I can think of is to very temporarily AW> offload a VM that''s gone heavily CPU bound onto an unloaded host. AW> Is there a more obviously useful situation that I''m missing? I think the immediate benefit is mostly as a "built-in" feature to allow migration of VMs easily between machines that do not share access to a centralized infrastructure. Right now, if you want to do that, you have to migrate the entire block device or file before you can start the domain on the other side. The lazy migration allows you to get the domain started immediately. It''s probably not insanely useful in an enterprise environment, but it would be a nice feature for Xen to have, and I think it''s possible that more enterprise functionality could arise from developing the foundation. Even if you had a centralized block server, you could still benefit from the abilities, by caching blocks locally in a local block device, such as a hard disk. The same infrastructure that provides the P2P lazy-copy migration could be used to provide local caching, and probably more interesting things. I guess my initial comment was: I would think real enterprise people would use iSCSI and a real SAN to provide access, instead of files on NFS. In that case, perhaps we can give more flexibility than the NFS solution, with better performance. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Warfield
2006-Jul-05 00:48 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
> Even if you > had a centralized block server, you could still benefit from the > abilities, by caching blocks locally in a local block device, such as > a hard disk. The same infrastructure that provides the P2P lazy-copy > migration could be used to provide local caching, and probably more > interesting things.Sure, local caching certainly makes sense here and I think there''s plenty of room to demonstrate benefit with using, but not depending on, local disk.> I guess my initial comment was: I would think real enterprise people > would use iSCSI and a real SAN to provide access, instead of files on > NFS. In that case, perhaps we can give more flexibility than the NFS > solution, with better performance.The concern that I have heard to motivate NFS is that vmware (and to a lesser degree virtual server) have effectively trained administrators to expect to manage VMs as image files (with vmdk/vhd). So people understand how to configure NFS, and they understand how to backup/snapshot/dup images using unix ''cp''. It''s a largely non-technical concern, and I agree that you could do cunning FS hacks to achieve the same sort of interface to LUNs or LVM volumes. Still, a lot of enterprise admins seem to be very attached to NFS, and a FS-level interface to their images and already have a lot of home-baked-goods to interact with them that way. To punctuate this (and somebody please correct me if this is inaccurate...) I think that VMware have only just started supporting iSCSI in the recent release of esx/infrastructure -- so across the boards of enterprise installs this is all reasonably new ground for existing users. a. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jul-05 01:40 UTC
Re: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)
On Tue, 2006-07-04 at 12:39 -0700, Andrew Warfield wrote:> > > In principle, with the right software, and configuring your entire > > > infrastructure from scratch, this sort of device-based mechanism may > > > work very well. > > > > Yes. It does. Here''s one we prepared earlier: > > http://www-03.ibm.com/press/us/en/pressrelease/19705.wss > > I rather doubt that anyone who happens to have purchased SVC as an > image store is terribly concerned about the ability to lazily copy VM > images from one local disk to another.Stephen was talking about a hypothetical cluster aware device infrastructure and I was pointing out that cluster aware device infrastructures were already a solved problem and only hypothetical in the sense that there isn''t an open source implementation of one yet. I was also pointing out that the technique used to create the cluster aware device infrastructure for SVC which is publicly written up (amongst other things) here http://www.research.ibm.com/journal/sj/422/glider.pdf but better described in purest form here http://portal.acm.org/citation.cfm?id=279227.279229 can also conveniently be used to solve almost all the difficult clustering problems in clustered Xen deployments of which there will be many--including the problem of making lazy migrations between local disks on different physical machines sufficiently robust to allow an enterprise class customer to consider using the feature should we choose to implement it. Harry. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel