Eric Blake
2021-Feb-22 21:19 UTC
[Libguestfs] [PATCH nbdcopy] copy: Implement extent metadata for efficient copying.
On 11/24/20 7:04 AM, Richard W.M. Jones wrote:> This implements these interconnected options: > > --allocated > --destination-is-zero (alias: --target-is-zero) > --no-extents > --- > TODO | 6 +-Late review inspired by Nir's recent work, oh well.> +++ b/copy/copy-sparse-allocated.sh > @@ -0,0 +1,92 @@> + > +requires nbdkit --version > +requires nbdkit --exit-with-parent --versionDo we really need the first line, since the second covers the same?> +++ b/copy/copy-sparse-no-extents.sh> +requires nbdkit --version > +requires nbdkit --exit-with-parent --versionMultiple instances.> +++ b/copy/copy-sparse.sh > @@ -0,0 +1,97 @@> +# Check the output matches expected. > +if [ "$(cat $out)" != "pwrite 1 4294967296 > +pwrite 32768 0 > +pwrite 32768 1073709056 > +pwrite 32768 4294934528 > +trim 134184960 32768Sorry I didn't notice this earlier, but Nir is on the correct track, and updates this test to expect calls to zero instead (as we cannot guarantee that read-after-trim reads zero, at least not without a further NBD spec extension).> +++ b/copy/file-ops.c > @@ -24,7 +24,16 @@> + > +/* This is done synchronously, but that's fine because commands from > + * the previous work range in flight continue to run, it's difficult > + * to (sanely) start new work until we have the full list of extents, > + * and in almost every case the remote NBD server can answer our > + * request for extents in a single round trip. > + */ > +static void > +nbd_get_extents (struct rw *rw, uintptr_t index, > + uint64_t offset, uint64_t count, > + extent_list *ret) > +{ > + extent_list exts = empty_vector;> + > + /* Copy the extents returned into the final list (ret). This is > + * complicated because the extents returned by the server may > + * begin earlier and begin or end later than the requested size. > + */Can the extents returned by the server actually begin earlier? They can definitely end earlier or later than the reequested size, but it seems like the beginning is always fixed. (nbdkit allows plugins to return early data, but then munges that return to pass the client something that starts from the client's requested offset)> + for (i = 0; i < exts.size; ++i) { > + uint64_t d; > + > + if (exts.ptr[i].offset + exts.ptr[i].length <= offset) > + continue;which makes me wonder if this 'if' was dead code.> + if (exts.ptr[i].offset < offset) { > + d = offset - exts.ptr[i].offset; > + exts.ptr[i].offset += d; > + exts.ptr[i].length -= d; > + assert (exts.ptr[i].offset == offset); > + } > + if (exts.ptr[i].offset + exts.ptr[i].length > offset + count) { > + d = offset + count - exts.ptr[i].offset - exts.ptr[i].length; > + exts.ptr[i].length -= d; > + assert (exts.ptr[i].length == offset + count); > + } > + if (extent_list_append (ret, exts.ptr[i]) == -1) { > + perror ("realloc"); > + exit (EXIT_FAILURE); > + } > + > + offset += exts.ptr[i].length; > + count -= exts.ptr[i].length; > + } > + } > + > + free (exts.ptr); > +} > +> +++ b/copy/nbdcopy.pod > @@ -4,7 +4,9 @@ nbdcopy - copy to and from an NBD server > > =head1 SYNOPSIS > > - nbdcopy [-C N|--connections=N] [--flush] [-p|--progress] > + nbdcopy [--allocated] [-C N|--connections=N] > + [--destination-is-zero|--target-is-zero] > + [--flush] [--no-extents] [-p|--progress] > [-R N|--requests=N] [--synchronous] > [-T N|--threads=N] > SOURCE DESTINATION > @@ -74,6 +76,15 @@ formats use C<qemu-img convert>, see L<qemu-img(1)>. > > Display brief command line help and exit. > > +=item B<--allocated> > + > +Normally nbdcopy tries to create a sparse output (with holes), if the > +destination supports that. It does this in two ways: either using > +extent informtation from the source to copy holes (seeinformation> +I<--no-extents>), or by detecting runs of zeroes (see I<-S>). If you > +use I<--allocated> then nbdcopy creates a fully allocated, non-sparse > +output on the destination. > + > =item B<-C> N > > =item B<--connections=>N > @@ -82,11 +93,30 @@ Set the maximum number of NBD connections ("multi-conn"). By default > nbdcopy will try to use multi-conn with up to 4 connections if the NBD > server supports it. > > +=item B<--destination-is-zero> > + > +=item B<--target-is-zero> > + > +Assume the destination is already zeroed. This allows nbdcopy to skip > +copying blocks of zeroes from the source to the destination. This is > +not safe unless the destination device is already zeroed. > +(I<--target-is-zero> is provided for compatibility with > +L<qemu-img(1)>.) > + > =item B<--flush> > > Flush writes to ensure that everything is written to persistent > storage before nbdcopy exits. > > +=item B<--no-extents> > + > +Normally nbdcopy uses extent metadata to skip over parts of the source > +disk which contain holes. If you use this flag, nbdcopy ignores > +extent information and reads everything, which is usually slower. You > +might use this flag in two situations: the source NBD server has > +incorrect metadata information; or the source has very slow extent > +querying so it's faster to simply read all of the data. > +I'd still love to teach nbdkit how to send NBD_CMD_READ replies with zero chunks (right now, only qemu-nbd does that); even with --no-extents, proper handling of zero chunks can still let us proceed faster than blindly passing full runs of zeros over the network. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2021-Feb-24 10:22 UTC
[Libguestfs] [PATCH nbdcopy] copy: Implement extent metadata for efficient copying.
On Mon, Feb 22, 2021 at 03:19:18PM -0600, Eric Blake wrote:> On 11/24/20 7:04 AM, Richard W.M. Jones wrote: > > This implements these interconnected options: > > > > --allocated > > --destination-is-zero (alias: --target-is-zero) > > --no-extents > > --- > > TODO | 6 +- > > Late review inspired by Nir's recent work, oh well. > > > +++ b/copy/copy-sparse-allocated.sh > > @@ -0,0 +1,92 @@ > > > + > > +requires nbdkit --version > > +requires nbdkit --exit-with-parent --version > > Do we really need the first line, since the second covers the same? > > > +++ b/copy/copy-sparse-no-extents.sh > > > +requires nbdkit --version > > +requires nbdkit --exit-with-parent --version > > Multiple instances.It's not a big deal either way, but the thinking behind this is: either nbdkit might not be installed (which would be caught by the first line) or nbdkit might not have the --exit-with-parent option because it's running on an OS which doesn't support that like Windows. It doesn't really matter for practical purposes either way.> > +++ b/copy/copy-sparse.sh > > @@ -0,0 +1,97 @@ > > > +# Check the output matches expected. > > +if [ "$(cat $out)" != "pwrite 1 4294967296 > > +pwrite 32768 0 > > +pwrite 32768 1073709056 > > +pwrite 32768 4294934528 > > +trim 134184960 32768 > > Sorry I didn't notice this earlier, but Nir is on the correct track, and > updates this test to expect calls to zero instead (as we cannot > guarantee that read-after-trim reads zero, at least not without a > further NBD spec extension).Yup, Nir has now fixed this, thanks :-)> > +++ b/copy/file-ops.c > > @@ -24,7 +24,16 @@ > > > + > > +/* This is done synchronously, but that's fine because commands from > > + * the previous work range in flight continue to run, it's difficult > > + * to (sanely) start new work until we have the full list of extents, > > + * and in almost every case the remote NBD server can answer our > > + * request for extents in a single round trip. > > + */ > > +static void > > +nbd_get_extents (struct rw *rw, uintptr_t index, > > + uint64_t offset, uint64_t count, > > + extent_list *ret) > > +{ > > + extent_list exts = empty_vector; > > > + > > + /* Copy the extents returned into the final list (ret). This is > > + * complicated because the extents returned by the server may > > + * begin earlier and begin or end later than the requested size. > > + */ > > Can the extents returned by the server actually begin earlier? They can > definitely end earlier or later than the reequested size, but it seems > like the beginning is always fixed. (nbdkit allows plugins to return > early data, but then munges that return to pass the client something > that starts from the client's requested offset)Yes they cannot begin earlier, because the NBD_REPLY_TYPE_BLOCK_STATUS message descriptors only contain a length field. We could therefore simplify this.> > + for (i = 0; i < exts.size; ++i) { > > + uint64_t d; > > + > > + if (exts.ptr[i].offset + exts.ptr[i].length <= offset) > > + continue; > > which makes me wonder if this 'if' was dead code.I think so, yes.> > + if (exts.ptr[i].offset < offset) { > > + d = offset - exts.ptr[i].offset; > > + exts.ptr[i].offset += d; > > + exts.ptr[i].length -= d; > > + assert (exts.ptr[i].offset == offset); > > + } > > + if (exts.ptr[i].offset + exts.ptr[i].length > offset + count) { > > + d = offset + count - exts.ptr[i].offset - exts.ptr[i].length; > > + exts.ptr[i].length -= d; > > + assert (exts.ptr[i].length == offset + count); > > + } > > + if (extent_list_append (ret, exts.ptr[i]) == -1) { > > + perror ("realloc"); > > + exit (EXIT_FAILURE); > > + } > > + > > + offset += exts.ptr[i].length; > > + count -= exts.ptr[i].length; > > + } > > + } > > + > > + free (exts.ptr); > > +} > > + > > > +++ b/copy/nbdcopy.pod > > @@ -4,7 +4,9 @@ nbdcopy - copy to and from an NBD server > > > > =head1 SYNOPSIS > > > > - nbdcopy [-C N|--connections=N] [--flush] [-p|--progress] > > + nbdcopy [--allocated] [-C N|--connections=N] > > + [--destination-is-zero|--target-is-zero] > > + [--flush] [--no-extents] [-p|--progress] > > [-R N|--requests=N] [--synchronous] > > [-T N|--threads=N] > > SOURCE DESTINATION > > @@ -74,6 +76,15 @@ formats use C<qemu-img convert>, see L<qemu-img(1)>. > > > > Display brief command line help and exit. > > > > +=item B<--allocated> > > + > > +Normally nbdcopy tries to create a sparse output (with holes), if the > > +destination supports that. It does this in two ways: either using > > +extent informtation from the source to copy holes (see > > informationWill fix.> > +I<--no-extents>), or by detecting runs of zeroes (see I<-S>). If you > > +use I<--allocated> then nbdcopy creates a fully allocated, non-sparse > > +output on the destination. > > + > > =item B<-C> N > > > > =item B<--connections=>N > > @@ -82,11 +93,30 @@ Set the maximum number of NBD connections ("multi-conn"). By default > > nbdcopy will try to use multi-conn with up to 4 connections if the NBD > > server supports it. > > > > +=item B<--destination-is-zero> > > + > > +=item B<--target-is-zero> > > + > > +Assume the destination is already zeroed. This allows nbdcopy to skip > > +copying blocks of zeroes from the source to the destination. This is > > +not safe unless the destination device is already zeroed. > > +(I<--target-is-zero> is provided for compatibility with > > +L<qemu-img(1)>.) > > + > > =item B<--flush> > > > > Flush writes to ensure that everything is written to persistent > > storage before nbdcopy exits. > > > > +=item B<--no-extents> > > + > > +Normally nbdcopy uses extent metadata to skip over parts of the source > > +disk which contain holes. If you use this flag, nbdcopy ignores > > +extent information and reads everything, which is usually slower. You > > +might use this flag in two situations: the source NBD server has > > +incorrect metadata information; or the source has very slow extent > > +querying so it's faster to simply read all of the data. > > + > > I'd still love to teach nbdkit how to send NBD_CMD_READ replies with > zero chunks (right now, only qemu-nbd does that); even with > --no-extents, proper handling of zero chunks can still let us proceed > faster than blindly passing full runs of zeros over the network.Structured replies? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/