On 03/27/2015 09:35 AM, Richard W.M. Jones wrote:>>> What I care about is connecting libguestfs to qemu and reading a >>> snapshot at some point in time, even though the guest is still writing >>> away to its disks. Is this possible with drive-backup (or otherwise)? >> >> Yes, that is what drive-backup does. >> >> New writes coming from the guest are held up until the old data has been >> written to the NBD target. >> >> That way you get a point-in-time snapshot while the guest continues >> running. > > I understand how that can work for backups, where you want to copy > a whole disk consistently. > > But libguestfs doesn't want to do a backup, nor get a copy of the > whole disk, it just wants to access a scattering of blocks (maybe a > few hundred) but at a single point in time, in as lightweight a manner > as possible.If you KNOW what sectors you want to read, then your NBD target can ignore writes to the sectors you don't care about (the guest is changing data on a sector you don't care about; yeah, bandwidth was spent in telling you that, but you don't have to spend storage on it), while focusing on reading the sectors you do care about as fast as possible (to minimize the time spent dealing with uninteresting writes). If you DON'T know what sectors you want to read (because you are chasing file system pointers and don't know a priori where those pointers will resolve), then tracking ALL data flushed by guest writes IS the most efficient manner for keeping your point in time accurate for the duration of your fleecing operation. Either way, if you really are going to read only a few hundred sectors and then close the connection, it shouldn't matter if the drive-backup failed to send all guest sectors modified after the point in time, so long as every sector you read is accurate (either because the guest hasn't touched it in the meantime; or because even though the guest touched it after the point in time, you were given the original contents through your NBD target). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
On Fri, Mar 27, 2015 at 10:37:44AM -0600, Eric Blake wrote:> On 03/27/2015 09:35 AM, Richard W.M. Jones wrote: > > But libguestfs doesn't want to do a backup, nor get a copy of the > > whole disk, it just wants to access a scattering of blocks (maybe a > > few hundred) but at a single point in time, in as lightweight a manner > > as possible. > > If you KNOW what sectors you want to read, then your NBD target can > ignore writes to the sectors you don't care about (the guest is changing > data on a sector you don't care about; yeah, bandwidth was spent in > telling you that, but you don't have to spend storage on it), while > focusing on reading the sectors you do care about as fast as possible > (to minimize the time spent dealing with uninteresting writes). If you > DON'T know what sectors you want to read (because you are chasing file > system pointers and don't know a priori where those pointers will > resolve), then tracking ALL data flushed by guest writes IS the most > efficient manner for keeping your point in time accurate for the > duration of your fleecing operation. Either way, if you really are > going to read only a few hundred sectors and then close the connection, > it shouldn't matter if the drive-backup failed to send all guest sectors > modified after the point in time, so long as every sector you read is > accurate (either because the guest hasn't touched it in the meantime; or > because even though the guest touched it after the point in time, you > were given the original contents through your NBD target).AIUI: We'd issue a drive-backup monitor command with an nbd:... target. The custom NBD server receives a stream of blocks (as writes). On the other side of this, libguestfs is also talking to the custom NBD server. Libguestfs (which is really a qemu process) is issuing random reads. There's no way for the NBD server or anything else to predict what blocks libguestfs will want to read in advance. In the middle of this is our custom NBD server, probably implemented using nbdkit. It has to save all the writes from qemu. It has to satisfy the reads from libguestfs, probably by blocking libguestfs unless we've seen the corresponding write. The NBD server is going to be (a) storing huge quantities of temporary data which we'll mostly not use, and (b) blocking libguestfs for arbitrary periods of time. This doesn't sound very lightweight to me. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
On 03/27/2015 11:21 AM, Richard W.M. Jones wrote:> > AIUI: > > We'd issue a drive-backup monitor command with an nbd:... target. > > The custom NBD server receives a stream of blocks (as writes). > > On the other side of this, libguestfs is also talking to the custom > NBD server. Libguestfs (which is really a qemu process) is issuing > random reads. There's no way for the NBD server or anything else to > predict what blocks libguestfs will want to read in advance. > > In the middle of this is our custom NBD server, probably implemented > using nbdkit. It has to save all the writes from qemu. It has to > satisfy the reads from libguestfs, probably by blocking libguestfs > unless we've seen the corresponding write.Well, it only has to write the sectors touched by the guest in the meantime, not the entire disk. But yeah, a busy guest can cause a lot of sectors to be written in the meantime.> > The NBD server is going to be (a) storing huge quantities of temporary > data which we'll mostly not use, and (b) blocking libguestfs for > arbitrary periods of time. This doesn't sound very lightweight to me.Hmm. Sounds a bit like we want to take advantage of postcopy migration smarts - where the destination receives the full stream of writes as a low-priority, but can interject and request out-of-order reads to satisfy page faults on a high-priority. All reads are guaranteed to resolve to the correct data, even if it means blocking the read until the out-of-order page fault is read in, but the out-of-order processing means that you don't have to wait for the full stream to take place before you get the information you need a the moment. Is NBD bi-directional, in that the target can receive write requests at the same time it is sending read requests? It sounds like that is what we need. Or are we stuck with NBD being uni-directional, where the target can receive read and write commands, but can't send any commands back to the client in charge of the data being written? If that's the case, maybe the work in qemu 2.4 towards persistent dirty bitmaps can help: set up a bitmap before starting the NBD server, to track what the guest is dirtying. With the bitmap in place, then for every sector you want, you first read it directly from the source image, THEN check the persistent dirty bitmap to see if the sector has been marked for transfer to NBD. If so, then you'll have to wait for it to show up on the NBD target side; if not, then the guest hasn't touched it yet so you know what you read is correct. That still doesn't help optimizing out the writes to the NBD target for sectors you don't care about, and doesn't quite address the desire for making random reads take priority over linear streaming of dirty blocks, but it might help. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org