thr3ads.net - Libguestfs - [Libguestfs] Inspection of disk snapshots [Mar 2015]

If this information is useful, please help other people find it:
Share via:

NoxDaFox

2015-Mar-23 14:34 UTC

[Libguestfs] Inspection of disk snapshots

Greetings,

I have the following typical scenario: given one or more qcow2 base images
I clone them with COW and start the VMs.

At a certain point I'd like to inspect them in order to see their evolution
compared to the known base images. To do so I was thinking about taking a
disk snapshot of each VM and inspect its content through libguestfs (using
it's Python bindings).

Obviously I need the base image in order for libguestfs to correctly guess
the OS, the FS structure etc.. Problem is that that point when I inspect
the disk I get the whole disk state including the base image content (30K+
files and directories).

This is not an issue but it's a very heavy operation considering that some
of the snapshots are few megabytes while the base images are several
gigabytes.

Is there a way to programmatically instruct libguestfs to limit the
inspection to the sole snapshot?
Would it work as well with other disk format (vmdk linked clones for
example)?

For better comprehension I'll show the sequence I'm doing (I might do it
wrong of course):

virsh -c "qemu:///system" snapshot-create --disk-only
<domain-ID>

I get the snapshot location from its XML description and then:

qemu-img convert -f qcow2 -O qcow2 base_image.qcow2 snapshot.qcow2

At that point I mount it through libguestfs and inspect its content.

Thank you.

Richard W.M. Jones

2015-Mar-23 22:41 UTC

head link

Re: [Libguestfs] Inspection of disk snapshots

On Mon, Mar 23, 2015 at 04:34:21PM +0200, NoxDaFox
wrote:> Greetings,
> 
> I have the following typical scenario: given one or more qcow2 base images
> I clone them with COW and start the VMs.
> 
> At a certain point I'd like to inspect them in order to see their
evolution
> compared to the known base images. To do so I was thinking about taking a
> disk snapshot of each VM and inspect its content through libguestfs (using
> it's Python bindings).
>
> Obviously I need the base image in order for libguestfs to correctly guess
> the OS, the FS structure etc.. Problem is that that point when I inspect
> the disk I get the whole disk state including the base image content (30K+
> files and directories).
> 
> This is not an issue but it's a very heavy operation considering that
some
> of the snapshots are few megabytes while the base images are several
> gigabytes.
> 
> Is there a way to programmatically instruct libguestfs to limit the
> inspection to the sole snapshot?
> Would it work as well with other disk format (vmdk linked clones for
> example)?
> 
> For better comprehension I'll show the sequence I'm doing (I might
do it
> wrong of course):
> 
> virsh -c "qemu:///system" snapshot-create --disk-only
<domain-ID>
> 
> I get the snapshot location from its XML description and then:
> 
> qemu-img convert -f qcow2 -O qcow2 base_image.qcow2 snapshot.qcow2
This makes a copy of the whole disk image.  It's also not a consistent
(point in time) copy.
> At that point I mount it through libguestfs and inspect its content.
As long as you use the 'readonly=1' flag (which is really *essential*,
else you'll get disk corruption), you can just point libguestfs at the
base image:

  g = guestfs.GuestFS (python_return_dict=True)
  g.add_drive_opts ("base_image.qcow2", format="qcow2",
readonly=1)

That also doesn't get you a consistent snapshot, but it'll work most
of the time, and give you a clear error in libguestfs when it doesn't
(and won't corrupt your base disk or anything like that, provided
you're using readonly=1).

The effect of the readonly=1 flag is to create an external snapshot.
It is roughly the equivalent of doing:

  qemu-img create -f qcow2 -b base_image snapshot.qcow2
  < point libguestfs at snapshot.qcow2 >

If you want lightweight, consistent, point-in-time snapshots (which it
sounds like you do), qemu has recently been adding this capability.
See the 'drive-backup' monitor command.  I've not tried using that
and
I don't know if it is wired up through libvirt, but libguestfs should
be able to consume it since it's just an NBD source.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Richard W.M. Jones

2015-Mar-23 22:43 UTC

head link

Re: [Libguestfs] Inspection of disk snapshots

On Mon, Mar 23, 2015 at 10:41:01PM +0000, Richard W.M. Jones
wrote:> On Mon, Mar 23, 2015 at 04:34:21PM +0200, NoxDaFox wrote:
> > Greetings,
> > 
> > I have the following typical scenario: given one or more qcow2 base
images
> > I clone them with COW and start the VMs.
> > 
> > At a certain point I'd like to inspect them in order to see their
evolution
> > compared to the known base images. To do so I was thinking about
taking a
> > disk snapshot of each VM and inspect its content through libguestfs
(using
> > it's Python bindings).
> >
> > Obviously I need the base image in order for libguestfs to correctly
guess
> > the OS, the FS structure etc.. Problem is that that point when I
inspect
> > the disk I get the whole disk state including the base image content
(30K+
> > files and directories).
> > 
> > This is not an issue but it's a very heavy operation considering
that some
> > of the snapshots are few megabytes while the base images are several
> > gigabytes.
> > 
> > Is there a way to programmatically instruct libguestfs to limit the
> > inspection to the sole snapshot?
> > Would it work as well with other disk format (vmdk linked clones for
> > example)?
> > 
> > For better comprehension I'll show the sequence I'm doing (I
might do it
> > wrong of course):
> > 
> > virsh -c "qemu:///system" snapshot-create --disk-only
<domain-ID>
> > 
> > I get the snapshot location from its XML description and then:
> > 
> > qemu-img convert -f qcow2 -O qcow2 base_image.qcow2 snapshot.qcow2
> 
> This makes a copy of the whole disk image.  It's also not a consistent
> (point in time) copy.
Oh I see that you're copying the _snapshot_ that you created with
libvirt; it's not a whole disk copy.  There's still not any point in
doing this, and what I said below stands.
> > At that point I mount it through libguestfs and inspect its content.
> 
> As long as you use the 'readonly=1' flag (which is really
*essential*,
> else you'll get disk corruption), you can just point libguestfs at the
> base image:
> 
>   g = guestfs.GuestFS (python_return_dict=True)
>   g.add_drive_opts ("base_image.qcow2", format="qcow2",
readonly=1)
> 
> That also doesn't get you a consistent snapshot, but it'll work
most
> of the time, and give you a clear error in libguestfs when it doesn't
> (and won't corrupt your base disk or anything like that, provided
> you're using readonly=1).
> 
> The effect of the readonly=1 flag is to create an external snapshot.
> It is roughly the equivalent of doing:
> 
>   qemu-img create -f qcow2 -b base_image snapshot.qcow2
>   < point libguestfs at snapshot.qcow2 >
> 
> If you want lightweight, consistent, point-in-time snapshots (which it
> sounds like you do), qemu has recently been adding this capability.
> See the 'drive-backup' monitor command.  I've not tried using
that and
> I don't know if it is wired up through libvirt, but libguestfs should
> be able to consume it since it's just an NBD source.
Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

NoxDaFox

2015-Mar-24 08:54 UTC

head link

[Libguestfs] Fwd: Inspection of disk snapshots

I was sure I was doing something wrong as I'm not yet fully aware of QCOW2
snapshot feature and how it interacts with libguestfs.

I'll try to explain better the scenario:

I have several hosts running lots of VMs which are generated from few base
images, say A, B, C the base images (backing file) and A1, A2, A*, B1, B2,
B* clones on top of which the newly spawned VMs are running.
I need to collect the disk states of A*, B*, C* machines and see what has
been written there. I don't care about the whole content as the base images
content A, B, C are well known to me, only thing it matters are the deltas
of the new clones.

One more piece in the puzzle is that the inspection does not happen on the
hosts running the VMs but on a dedicated server.

My idea was to collect those "snapshots" (generic term not the QEMU
one)
from the hosts and send them to my inspection server. As A, B and C are
accessible from that server only thing I need is to rebase those snapshot
to correctly inspect them through libguestfs, and it proved to work (I'm
using readonly mode as I only care about reading the disks). I'm not really
interested in having consistent point-in-time state of the disks as the
operation is done several times a day so I can cope with semi-consistent
data as it can be easily re-constructed.

My real problem comes when I try to inspect the disk snapshot: libguestfs
will, of course, let me see the whole content of the disks, which means A +
A*. Apart from the waste of CPU time spend on looking at files I already
know the state (the ones contained in A), it generates a lot of noise. A
Linux base image with some library installed consists in 20+ K files,
installing something extra (Apache server for example) just brings some
hundreds new files and I'm interested only in those ones.

So my real question is: is there a way to distinguish the files contained
in the two different disk images (A and A1) or shall I think about a
totally different approach?

Thank you.

2015-03-24 0:43 GMT+02:00 Richard W.M. Jones <rjones@redhat.com>:
> On Mon, Mar 23, 2015 at 10:41:01PM +0000, Richard W.M. Jones wrote:
> > On Mon, Mar 23, 2015 at 04:34:21PM +0200, NoxDaFox wrote:
> > > Greetings,
> > >
> > > I have the following typical scenario: given one or more qcow2
base
> images
> > > I clone them with COW and start the VMs.
> > >
> > > At a certain point I'd like to inspect them in order to see
their
> evolution
> > > compared to the known base images. To do so I was thinking about
> taking a
> > > disk snapshot of each VM and inspect its content through
libguestfs
> (using
> > > it's Python bindings).
> > >
> > > Obviously I need the base image in order for libguestfs to
correctly
> guess
> > > the OS, the FS structure etc.. Problem is that that point when I
> inspect
> > > the disk I get the whole disk state including the base image
content
> (30K+
> > > files and directories).
> > >
> > > This is not an issue but it's a very heavy operation
considering that
> some
> > > of the snapshots are few megabytes while the base images are
several
> > > gigabytes.
> > >
> > > Is there a way to programmatically instruct libguestfs to limit
the
> > > inspection to the sole snapshot?
> > > Would it work as well with other disk format (vmdk linked clones
for
> > > example)?
> > >
> > > For better comprehension I'll show the sequence I'm doing
(I might do
> it
> > > wrong of course):
> > >
> > > virsh -c "qemu:///system" snapshot-create --disk-only
<domain-ID>
> > >
> > > I get the snapshot location from its XML description and then:
> > >
> > > qemu-img convert -f qcow2 -O qcow2 base_image.qcow2
snapshot.qcow2
> >
> > This makes a copy of the whole disk image.  It's also not a
consistent
> > (point in time) copy.
>
> Oh I see that you're copying the _snapshot_ that you created with
> libvirt; it's not a whole disk copy.  There's still not any point
in
> doing this, and what I said below stands.
>
> > > At that point I mount it through libguestfs and inspect its
content.
> >
> > As long as you use the 'readonly=1' flag (which is really
*essential*,
> > else you'll get disk corruption), you can just point libguestfs at
the
> > base image:
> >
> >   g = guestfs.GuestFS (python_return_dict=True)
> >   g.add_drive_opts ("base_image.qcow2",
format="qcow2", readonly=1)
> >
> > That also doesn't get you a consistent snapshot, but it'll
work most
> > of the time, and give you a clear error in libguestfs when it
doesn't
> > (and won't corrupt your base disk or anything like that, provided
> > you're using readonly=1).
> >
> > The effect of the readonly=1 flag is to create an external snapshot.
> > It is roughly the equivalent of doing:
> >
> >   qemu-img create -f qcow2 -b base_image snapshot.qcow2
> >   < point libguestfs at snapshot.qcow2 >
> >
> > If you want lightweight, consistent, point-in-time snapshots (which it
> > sounds like you do), qemu has recently been adding this capability.
> > See the 'drive-backup' monitor command.  I've not tried
using that and
> > I don't know if it is wired up through libvirt, but libguestfs
should
> > be able to consume it since it's just an NBD source.
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat
> http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-top is 'top' for virtual machines.  Tiny program with many
> powerful monitoring features, net stats, disk stats, logging, etc.
> http://people.redhat.com/~rjones/virt-top
>

Richard W.M. Jones

2015-Mar-24 11:32 UTC

head link

Re: [Libguestfs] Fwd: Inspection of disk snapshots

On Tue, Mar 24, 2015 at 10:54:05AM +0200, NoxDaFox
wrote:> I was sure I was doing something wrong as I'm not yet fully aware of
QCOW2
> snapshot feature and how it interacts with libguestfs.
> 
> I'll try to explain better the scenario:
> 
> I have several hosts running lots of VMs which are generated from few base
> images, say A, B, C the base images (backing file) and A1, A2, A*, B1, B2,
> B* clones on top of which the newly spawned VMs are running.
> I need to collect the disk states of A*, B*, C* machines and see what has
> been written there. I don't care about the whole content as the base
images
> content A, B, C are well known to me, only thing it matters are the deltas
> of the new clones.
> 
> One more piece in the puzzle is that the inspection does not happen on the
> hosts running the VMs but on a dedicated server.
> 
> My idea was to collect those "snapshots" (generic term not the
QEMU one)
> from the hosts and send them to my inspection server. As A, B and C are
> accessible from that server only thing I need is to rebase those snapshot
> to correctly inspect them through libguestfs, and it proved to work
(I'm
> using readonly mode as I only care about reading the disks). I'm not
really
> interested in having consistent point-in-time state of the disks as the
> operation is done several times a day so I can cope with semi-consistent
> data as it can be easily re-constructed.
> 
> My real problem comes when I try to inspect the disk snapshot: libguestfs
> will, of course, let me see the whole content of the disks, which means A +
> A*. Apart from the waste of CPU time spend on looking at files I already
> know the state (the ones contained in A), it generates a lot of noise. A
> Linux base image with some library installed consists in 20+ K files,
> installing something extra (Apache server for example) just brings some
> hundreds new files and I'm interested only in those ones.
> 
> So my real question is: is there a way to distinguish the files contained
> in the two different disk images (A and A1) or shall I think about a
> totally different approach?
Well we have a tool called virt-diff
(http://libguestfs.org/virt-diff.1.html) which prints the differences
between two disks.  It's quite commonly used to show the differences
between an original base image and a snapshot taken some time later,
so you can tell which files have been modified by the guest.

Now virt-diff works by opening both disks, reading all of the metadata
(or even the file content if you use the --checksum option), and then
internally diffing it and presenting the result.

Of course this means it's not especially fast, but it's the way that
it has to work: The snapshot doesn't contain "files which
changed", it
contains underlying device blocks which changed.  It operates a whole
layer or two below the filesystem.

To do this from Python is not particularly hard, but you'll have to
read the C and translate it.  The guts of the algorithm are in the
recursive "visitor" mini-library:

https://github.com/libguestfs/libguestfs/blob/master/diff/diff.c
https://github.com/libguestfs/libguestfs/blob/master/cat/visit.h
https://github.com/libguestfs/libguestfs/blob/master/cat/visit.c

There are alternatives -- perhaps parsing the qcow2 snapshot, and
mapping disk blocks back to files -- but they won't be very easy to
implement.  I wrote a highly experimental* tool called 'virt-bmap' that
may be of interest:

https://rwmj.wordpress.com/2014/11/23/mapping-files-to-disk/
https://rwmj.wordpress.com/2014/11/24/mapping-files-to-disk-part-2/

Rich.

* = if it breaks, you get to keep all the pieces

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Maybe Matching Threads

Search for more seemingly similar threads

Libguestfs - Mar 2015 - Inspection of disk snapshots

[Libguestfs] Inspection of disk snapshots

Re: [Libguestfs] Inspection of disk snapshots

Re: [Libguestfs] Inspection of disk snapshots

[Libguestfs] Fwd: Inspection of disk snapshots

Re: [Libguestfs] Fwd: Inspection of disk snapshots

Maybe Matching Threads