Nir Soffer
2020-Jun-28 13:46 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
On Sun, Jun 28, 2020 at 4:03 PM Richard W.M. Jones <rjones@redhat.com> wrote: ...> + > +static int > +tar_get_ready (void) > +{ > + FILE *fp; > + CLEANUP_FREE char *cmd = NULL; > + size_t len = 0; > + bool scanned_ok; > + char s[256]; > + > + /* Construct the tar command to examine the tar file. */ > + fp = open_memstream (&cmd, &len); > + if (fp == NULL) { > + nbdkit_error ("open_memstream: %m"); > + return -1; > + } > + fprintf (fp, "LANG=C tar --no-auto-compress -tRvf ");Using -R is nice, but is block size documented? Also --block-number would be nicer.> + shell_quote (tarfile, fp);This is questionable. Why use string and quote the string instead of using argv directly?> + fputc (' ', fp); > + shell_quote (file, fp); > + if (fclose (fp) == EOF) { > + nbdkit_error ("memstream failed: %m"); > + return -1; > + } > + > + /* Run the command and read the first line of the output. */ > + nbdkit_debug ("%s", cmd); > + fp = popen (cmd, "r"); > + if (fp == NULL) { > + nbdkit_error ("tar: %m"); > + return -1; > + } > + scanned_ok = fscanf (fp, "block %" SCNu64 ": %*s %*s %" SCNu64, > + &offset, &size) == 2; > + /* We have to now read and discard the rest of the output until EOF. */ > + while (fread (s, sizeof s, 1, fp) > 0) > + ; > + if (pclose (fp) != 0) { > + nbdkit_error ("tar subcommand failed, " > + "check that the file really exists in the tarball"); > + return -1; > + } > + > + if (!scanned_ok) { > + nbdkit_error ("unexpected output from the tar subcommand"); > + return -1; > + } > + > + /* Adjust the offset: Add 1 for the tar header, then multiply by the > + * block size. > + */ > + offset = (offset+1) * 512; > + > + nbdkit_debug ("tar: offset %" PRIu64 ", size %" PRIu64, offset, size); > + > + /* Check it looks sensible. XXX We ought to check it doesn't exceed > + * the size of the tar file. > + */ > + if (offset >= INT64_MAX || size >= INT64_MAX) { > + nbdkit_error ("internal error: calculated offset and size are wrong"); > + return -1; > + } > + > + return 0; > +} > + > +struct handle { > + int fd; > +}; > + > +static void * > +tar_open (int readonly) > +{ > + struct handle *h; > + > + assert (offset > 0); /* Cannot be zero because of tar header. */ > + > + h = calloc (1, sizeof *h); > + if (h == NULL) { > + nbdkit_error ("calloc: %m"); > + return NULL; > + } > + h->fd = open (tarfile, (readonly ? O_RDONLY : O_RDWR) | O_CLOEXEC); > + if (h->fd == -1) { > + nbdkit_error ("%s: %m", tarfile); > + free (h); > + return NULL; > + } > + > + return h; > +} > + > +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL > + > +/* Get the file size. */ > +static int64_t > +tar_get_size (void *handle) > +{ > + return size; > +} > + > +/* Serves the same data over multiple connections. */ > +static int > +tar_can_multi_conn (void *handle) > +{ > + return 1; > +} > + > +static int > +tar_can_cache (void *handle) > +{ > + /* Let nbdkit call pread to populate the file system cache. */ > + return NBDKIT_CACHE_EMULATE; > +} > + > +/* Read data from the file. */ > +static int > +tar_pread (void *handle, void *buf, uint32_t count, uint64_t offs)This should be identical to file plugin, on? can we reuse the same code for reading files?> +{ > + struct handle *h = handle; > + > + offs += offset; > + > + while (count > 0) { > + ssize_t r = pread (h->fd, buf, count, offs); > + if (r == -1) { > + nbdkit_error ("pread: %m"); > + return -1; > + } > + if (r == 0) { > + nbdkit_error ("pread: unexpected end of file"); > + return -1; > + } > + buf += r; > + count -= r; > + offs += r; > + } > + > + return 0; > +} > + > +/* Write data to the file. */ > +static int > +tar_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offs) > +{ > + struct handle *h = handle; > + > + offs += offset; > + > + while (count > 0) { > + ssize_t r = pwrite (h->fd, buf, count, offs); > + if (r == -1) { > + nbdkit_error ("pwrite: %m"); > + return -1; > + } > + buf += r; > + count -= r; > + offs += r; > + } > + > + return 0; > +} > + > +static struct nbdkit_plugin plugin = { > + .name = "tar", > + .longname = "nbdkit tar plugin", > + .version = PACKAGE_VERSION, > + .unload = tar_unload, > + .config = tar_config, > + .config_complete = tar_config_complete, > + .config_help = tar_config_help, > + .magic_config_key = "tar", > + .get_ready = tar_get_ready, > + .open = tar_open, > + .get_size = tar_get_size, > + .can_multi_conn = tar_can_multi_conn, > + .can_cache = tar_can_cache, > + .pread = tar_pread, > + .pwrite = tar_pwrite, > + .errno_is_preserved = 1, > +}; > + > +NBDKIT_REGISTER_PLUGIN(plugin) > diff --git a/tests/test-dump-plugin.sh b/tests/test-dump-plugin.sh > index 0b4c1ce1..6eb25a65 100755 > --- a/tests/test-dump-plugin.sh > +++ b/tests/test-dump-plugin.sh > @@ -46,7 +46,7 @@ do_test () > python-valgrind | ruby-valgrind | tcl-valgrind) > echo "$0: skipping $1$vg because this language doesn't support valgrind" > ;; > - example4* | tar*) > + example4*) > # These tests are written in Perl so we have to check that > # the Perl plugin was compiled. > if nbdkit perl --version; then run_test $1; fi > diff --git a/tests/test-help-plugin.sh b/tests/test-help-plugin.sh > index 7dc26ece..f0dfa7df 100755 > --- a/tests/test-help-plugin.sh > +++ b/tests/test-help-plugin.sh > @@ -46,7 +46,7 @@ do_test () > python-valgrind | ruby-valgrind | tcl-valgrind) > echo "$0: skipping $1$vg because this language doesn't support valgrind" > ;; > - example4* | tar*) > + example4*) > # These tests are written in Perl so we have to check that > # the Perl plugin was compiled. > if nbdkit perl --version; then run_test $1; fi > diff --git a/tests/test-tar-info.sh b/tests/test-tar-info.sh > new file mode 100755 > index 00000000..efaa8ec8 > --- /dev/null > +++ b/tests/test-tar-info.sh > @@ -0,0 +1,67 @@ > +#!/usr/bin/env bash > +# nbdkit > +# Copyright (C) 2017-2020 Red Hat Inc. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions are > +# met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in the > +# documentation and/or other materials provided with the distribution. > +# > +# * Neither the name of Red Hat nor the names of its contributors may be > +# used to endorse or promote products derived from this software without > +# specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND > +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR > +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, > +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > +# SUCH DAMAGE. > + > +# Test that qemu-img info works on a qcow2 file in a tar file. > + > +source ./functions.sh > +set -e > +set -x > + > +requires test -f disk > +requires guestfish --version > +requires tar --version > +requires qemu-img --version > +requires qemu-img info --output=json /dev/null > +requires jq --version > +requires stat --version > + > +disk=tar-info-disk.qcow2 > +out=tar-info.out > +tar=tar-info.tar > +files="$disk $out $tar" > +rm -f $files > +cleanup_fn rm -f $files > + > +# Create a tar file containing a known qcow2 file. > +qemu-img convert -f raw disk -O qcow2 $disk > +tar cf $tar $disk > + > +# Run nbdkit. > +nbdkit -U - tar $tar file=$disk --run 'qemu-img info --output=json $nbd' > $out > +cat $out > + > +# Check various fields in the input. > +# Virtual size must be the same as the size of the original raw disk. > +test "$( jq -r -c '.["virtual-size"]' $out )" -eq "$( stat -c %s disk )" > + > +# Format must be qcow2. > +test "$( jq -r -c '.["format"]' $out )" = "qcow2" > diff --git a/tests/test-tar.sh b/tests/test-tar.sh > index c6d726c4..3164b826 100755 > --- a/tests/test-tar.sh > +++ b/tests/test-tar.sh > @@ -38,23 +38,27 @@ requires test -f disk > requires guestfish --version > requires tar --version > > -# The tar plugin requires some Perl modules, this checks if they are > -# installed. > -requires perl -MCwd -e 1 > -requires perl -MIO::File -e 1 > - > sock=`mktemp -u` > files="tar.pid tar.tar $sock" > rm -f $files > cleanup_fn rm -f $files > > -# Create a tar file containing the disk image. > -tar cf tar.tar disk > +# Create a tar file containing the disk image plus some other random > +# files that hopefully will be ignored. > +tar cf tar.tar test-tar.sh Makefile disk Makefile.am > +tar tvvf tar.tar > > # Run nbdkit. > start_nbdkit -P tar.pid -U $sock tar tar=tar.tar file=disk > > -# Now see if we can open the disk from the tar file. > -guestfish -x --ro --format=raw -a "nbd://?socket=$sock" -m /dev/sda1 <<EOF > +# Now see if we can open, read and write the disk from the tar file. > +guestfish -x --format=raw -a "nbd://?socket=$sock" -m /dev/sda1 <<EOF > cat /hello.txt > + > + # Write a new file. > + write /test.txt "hello" > + cat /test.txt > EOF > + > +# Check that the tar file isn't corrupt. > +tar tvvf tar.tar > diff --git a/tests/test-version-plugin.sh b/tests/test-version-plugin.sh > index 7d2ab072..c397afc2 100755 > --- a/tests/test-version-plugin.sh > +++ b/tests/test-version-plugin.sh > @@ -46,7 +46,7 @@ do_test () > python-valgrind | ruby-valgrind | tcl-valgrind) > echo "$0: skipping $1$vg because this language doesn't support valgrind" > ;; > - example4* | tar*) > + example4*) > # These tests are written in Perl so we have to check that > # the Perl plugin was compiled. > if nbdkit perl --version; then run_test $1; fi > diff --git a/wrapper.c b/wrapper.c > index b1e2ce2f..c27afae0 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -85,7 +85,7 @@ static size_t len; > static bool > is_perl_plugin (const char *name) > { > - return strcmp (name, "example4") == 0 || strcmp (name, "tar") == 0; > + return strcmp (name, "example4") == 0; > } > > static void > diff --git a/.gitignore b/.gitignore > index f02d19dc..48a5302e 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -79,7 +79,6 @@ plugins/*/*.3 > /plugins/ocaml/nbdkit-ocamlexample-plugin.so > /plugins/rust/Cargo.lock > /plugins/rust/target > -/plugins/tar/nbdkit-tar-plugin > /plugins/tmpdisk/default-command.c > /podwrapper.pl > /server/local/nbdkit.pc > diff --git a/README b/README > index 4f584828..7733761e 100644 > --- a/README > +++ b/README > @@ -124,13 +124,13 @@ For the bittorrent plugin: > > - libtorrent-rasterbar (https://www.libtorrent.org) > > -For the Perl, example4 and tar plugins: > +For the Perl and example4 plugins: > > - perl interpreter > > - perl development libraries > > - - perl modules ExtUtils::Embed, IO::File and Cwd > + - perl modules ExtUtils::Embed > > For the Python plugin: > > -- > 2.25.0 >
Richard W.M. Jones
2020-Jun-28 21:35 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
On Sun, Jun 28, 2020 at 04:46:59PM +0300, Nir Soffer wrote:> On Sun, Jun 28, 2020 at 4:03 PM Richard W.M. Jones <rjones@redhat.com> wrote: > ... > > + fprintf (fp, "LANG=C tar --no-auto-compress -tRvf "); > > Using -R is nice, but is block size documented? > > Also --block-number would be nicer.Yes the block size is always 512 bytes: https://www.gnu.org/software/tar/manual/html_node/Blocking.html By the way, FreeBSD tar will not work (and presumably other non-GNU tar), so I guess we should probably try to probe tar and come up with a nicer error message: https://www.freebsd.org/cgi/man.cgi?tar(1)> > + shell_quote (tarfile, fp); > > This is questionable. Why use string and quote the string instead of using > argv directly?Convenience basically. The alternative is to fork and exec the tar command in a subprocess, while reading from a pipe that we have created, which is a pain to write correctly.> > +/* Read data from the file. */ > > +static int > > +tar_pread (void *handle, void *buf, uint32_t count, uint64_t offs) > > This should be identical to file plugin, on? can we reuse the same code > for reading files?Yes and no. A few months ago I actually had a proposal to unify all the "file-like" plugins. I believe the latest posted version was this one: https://www.redhat.com/archives/libguestfs/2020-April/msg00083.html but I have some updates in a branch here. Anyway when I was looking at this rewrite I considered resurrecting this work and building on top of it. However it doesn't work directly because:> > +{ > > + struct handle *h = handle; > > + > > + offs += offset;you have to add an offset to each request. Of course we could make a generic file ops which supported offsets (and also a non-FUA extension to deal with the tmpdisk plugin), but it all gets a bit complicated and not as generic as it initially looked. For completeness here are the other ways I investigated: * Make generic read, write, zero, trim, extents operations on local files. The file-like plugins would consume those directly, after doing any adjustments eg for offset. Unfortunately the operations that you have to write are not very "pure" and depend on a bunch of state across calls and you end up not having much common code: https://github.com/libguestfs/nbdkit/blob/f6d4365364f2c90dde0166ae4355f74f28e112ff/plugins/file/file.c#L149 * Have the new tar plugin re-exec nbdkit and run the ordinary file plugin + offset filter. After experience with the VDDK plugin which does this through necessity, I don't want to go there right now. See this file to understand the kind of complexity this introduces: https://github.com/libguestfs/nbdkit/blob/master/plugins/vddk/reexec.c * Instead of writing a tar plugin, write a wrapper script which hands off to file plugin + offset filter. From the user's point of view the script would work a lot differently from other plugins. So no good ideas so far. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Eric Blake
2020-Jul-06 19:08 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
On 6/28/20 4:35 PM, Richard W.M. Jones wrote:> For completeness here are the other ways I investigated: > > * Make generic read, write, zero, trim, extents operations on local > files. The file-like plugins would consume those directly, after > doing any adjustments eg for offset. Unfortunately the operations > that you have to write are not very "pure" and depend on a bunch of > state across calls and you end up not having much common code: > https://github.com/libguestfs/nbdkit/blob/f6d4365364f2c90dde0166ae4355f74f28e112ff/plugins/file/file.c#L149 > > * Have the new tar plugin re-exec nbdkit and run the ordinary file > plugin + offset filter. After experience with the VDDK plugin which > does this through necessity, I don't want to go there right now. > See this file to understand the kind of complexity this introduces: > https://github.com/libguestfs/nbdkit/blob/master/plugins/vddk/reexec.c > > * Instead of writing a tar plugin, write a wrapper script which hands > off to file plugin + offset filter. From the user's point of view > the script would work a lot differently from other plugins. > > So no good ideas so far.How hard is it to write a tar filter instead of a tar plugin (similar to how we moved ext2 from plugin to filter)? - During .get_ready, we have to find some way to read the underlying plugin to feed a pipeline to tar to decode what the file contains (we can't just hand the file to tar, but instead have to feed it data through stdin; but the amount of work is no different: tar really does have to read the entire image during 'tar tRvf'). - There would no longer be a tar=... parameter, rather that is the role of the plugin - We'd have to rename the file=... parameter to something that won't conflict with the most common use of having the file plugin serve the tar file - But once we've done the .get_ready scan of the entire tar file, we can then service offsets to the plugin similarly to how the offset filter works -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Reasonably Related Threads
- Re: [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- Re: [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- Re: [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- Re: [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.