Nir Soffer
2021-Jun-11 23:39 UTC
[Libguestfs] [PATCH 2/2] nbd: Add new qemu:joint-allocation metadata context
On Wed, Jun 9, 2021 at 9:01 PM Eric Blake <eblake at redhat.com> wrote:> > When trying to reconstruct a qcow2 chain using information provided > over NBD, ovirt had been relying on an unsafe assumption that any > portion of the qcow2 file advertised as sparse would defer to the > backing image; this worked with what qemu 5.2 reports for a qcow2 BSD > loaded with "backing":null. However, in 6.0, commit 0da9856851 (nbd: > server: Report holes for raw images) also had a side-effect of > reporting unallocated zero clusters in qcow2 files as sparse. This > change is correct from the NBD spec perspective (advertising bits has > always been optional based on how much information the server has > available, and should only be used to optimize behavior when a bit is > set, while not assuming semantics merely because a bit is clear), but > means that a qcow2 file that uses an unallocated zero cluster to > override a backing file now shows up as sparse over NBD, and causes > ovirt to fail to reproduce that cluster (ie. ovirt was assuming it > only had to write clusters where the bit was clear, and the 6.0 > behavior change shows the flaw in that assumption). > > The correct fix is for ovirt to additionally use the > qemu:allocation-depth metadata context added in 5.2: after all, the > actual determination for what is needed to recreate a qcow2 file is > not whether a cluster is sparse, but whether the allocation-depth > shows the cluster to be local. But reproducing an image is more > efficient when handling known-zero clusters, which means that ovirt > has to track both base:allocation and qemu:allocation-depth metadata > contexts simultaneously. While NBD_CMD_BLOCK_STATUS is just fine > sending back information for two contexts in parallel, it comes with > some bookkeeping overhead at the client side: the two contexts need > not report the same length of replies, and it involves more network > traffic.Since this change is not simple, and the chance that we also get the dirty bitmap included in the result seems to be very low, I decided to check the direction of merging multiple extents. I started with merging "base:allocation" and "qemu:dirty-bitmap:xxx" since we already have both. It was not hard to do, although it is not completely tested yet. Here is the merging code: https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/ovirt_imageio/_internal/nbdutil.py To make merging easy and safe, we map the NBD_STATE_DIRTY bit to a private bit so it cannot clash with the NBD_STATE_HOLE bit: https://gerrit.ovirt.org/c/ovirt-imageio/+/115215/1/daemon/ovirt_imageio/_internal/nbd.py Here is a functional test using qemu-nbd showing that it works: https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/test/client_test.py I'll try to use "qemu:allocation-depth" in a similar way next week, probably mapping depth > 0 to EXTENT_EXISTS, to use when reporting holes in single qcow2 images. If this is successful, we can start using this in the next ovirt release, and we don't need "qemu:joint-allocation". Nir
Eric Blake
2021-Jun-14 13:56 UTC
[Libguestfs] [PATCH 2/2] nbd: Add new qemu:joint-allocation metadata context
On Sat, Jun 12, 2021 at 02:39:44AM +0300, Nir Soffer wrote:> Since this change is not simple, and the chance that we also get the dirty > bitmap included in the result seems to be very low, I decided to check the > direction of merging multiple extents. > > I started with merging "base:allocation" and "qemu:dirty-bitmap:xxx" since > we already have both. It was not hard to do, although it is not completely > tested yet. > > Here is the merging code: > https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/ovirt_imageio/_internal/nbdutil.py > > To make merging easy and safe, we map the NBD_STATE_DIRTY bit to a private bit > so it cannot clash with the NBD_STATE_HOLE bit: > https://gerrit.ovirt.org/c/ovirt-imageio/+/115215/1/daemon/ovirt_imageio/_internal/nbd.py > > Here is a functional test using qemu-nbd showing that it works: > https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/test/client_test.py > > I'll try to use "qemu:allocation-depth" in a similar way next week, probably > mapping depth > 0 to EXTENT_EXISTS, to use when reporting holes in > single qcow2 images. > > If this is successful, we can start using this in the next ovirt release, and we > don't need "qemu:joint-allocation".That's nice to know. So at this point, we'll drop the patch on qemu:joint-allocation, and instead focus on teh patch that improves qemu-img map output to make it easier to use in the same way that qemu:allocation-depth is. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org