Eric Blake
2020-Jul-06 18:58 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
On 6/28/20 8:02 AM, Richard W.M. Jones wrote:> --- > plugins/tar/{tar.pl => nbdkit-tar-plugin.pod} | 145 ++------- > configure.ac | 2 - > plugins/tar/Makefile.am | 41 +-- > tests/Makefile.am | 14 +- > plugins/tar/tar.c | 286 ++++++++++++++++++ > tests/test-dump-plugin.sh | 2 +- > tests/test-help-plugin.sh | 2 +- > tests/test-tar-info.sh | 67 ++++ > tests/test-tar.sh | 22 +- > tests/test-version-plugin.sh | 2 +- > wrapper.c | 2 +- > .gitignore | 1 - > README | 4 +- > 13 files changed, 427 insertions(+), 163 deletions(-) >> > - nbdkit tar tar=FILENAME.tar file=PATH_INSIDE_TAR > + nbdkit tar [tar=]FILENAME.tar file=PATH_INSIDE_TARI'm considering a followup patch to allow file=exportname as a magic filename, similarly to how we did in the ext2 filter.> +=item B<file=>PATH_INSIDE_TAR > + > +The path of the file inside the tarball to serve. This parameter is > +required. It must exactly match the name stored in the tarball, so > +use S<C<tar tf filename.tar>>However, I'm a bit worried about how it would work if the tarfile itself includes a file named 'exportname' in the top directory of the tarfile. A quick test confirms my worry: $ cd /tmp $ touch exportname $ tar cf f.tar exportname $ tar tf f.tar exportname $ LANG=C tar --no-auto-compress -tRvf f.tar exportname block 0: -rw-rw-r-- eblake/eblake 0 2020-07-06 13:37 exportname block 1: ** Block of NULs ** $ LANG=C tar --no-auto-compress -tRvf f.tar ./exportname block 1: ** Block of NULs ** tar: ./exportname: Not found in archive tar: Exiting with failure status due to previous errors so if we like the idea, we'd have to allow the user to specify mutually-exclusive config parameters: either file= to something within the file, or exportname=on to allow the client to choose, where we then validate that exactly one of those two options is configured.> +++ b/plugins/tar/Makefile.am > @@ -1,5 +1,5 @@ > # nbdkit > -# Copyright (C) 2013-2020 Red Hat Inc. > +# Copyright (C) 2018-2020 Red Hat Inc.Interesting change in dates.> +++ b/plugins/tar/tar.c > @@ -0,0 +1,286 @@ > +/* nbdkit > + * Copyright (C) 2018-2020 Red Hat Inc. > + *> +static int > +tar_config_complete (void) > +{ > + if (tarfile == NULL || file == NULL) { > + nbdkit_error ("you must supply the tar=<TARFILE> and file=<FILENAME> " > + "parameters"); > + return -1; > + } > + > + return 0; > +} > + > +#define tar_config_help \ > + "[tar=]<TARBALL> (required) The name of the tar file.\n" \ > + "file=<FILENAME> The path inside the tar file to server."Should '(required)' be listed on both lines? (Not necessarily on the second, if we go with the exportname=on option)> + > +static int > +tar_get_ready (void) > +{ > + FILE *fp; > + CLEANUP_FREE char *cmd = NULL; > + size_t len = 0; > + bool scanned_ok; > + char s[256]; > + > + /* Construct the tar command to examine the tar file. */ > + fp = open_memstream (&cmd, &len); > + if (fp == NULL) { > + nbdkit_error ("open_memstream: %m"); > + return -1; > + } > + fprintf (fp, "LANG=C tar --no-auto-compress -tRvf ");Use of --no-auto-compress is specific to GNU tar, do we care? Should we allow a 'tar=/path/to/tar' parameter during .config to allow a user to point to an alternative tar?> + shell_quote (tarfile, fp); > + fputc (' ', fp); > + shell_quote (file, fp); > + if (fclose (fp) == EOF) { > + nbdkit_error ("memstream failed: %m"); > + return -1; > + } > + > + /* Run the command and read the first line of the output. */ > + nbdkit_debug ("%s", cmd); > + fp = popen (cmd, "r"); > + if (fp == NULL) { > + nbdkit_error ("tar: %m"); > + return -1; > + } > + scanned_ok = fscanf (fp, "block %" SCNu64 ": %*s %*s %" SCNu64, > + &offset, &size) == 2;fscanf() parsing an integer is subject to undefined behavior on overflow. (Yes, I know it is a pre-existing and prevalent issue...)> + /* We have to now read and discard the rest of the output until EOF. */ > + while (fread (s, sizeof s, 1, fp) > 0) > + ; > + if (pclose (fp) != 0) { > + nbdkit_error ("tar subcommand failed, " > + "check that the file really exists in the tarball"); > + return -1; > + }If we add exportname support, we'll have to defer checking for file existence until during .open.> + > + if (!scanned_ok) { > + nbdkit_error ("unexpected output from the tar subcommand"); > + return -1; > + } > + > + /* Adjust the offset: Add 1 for the tar header, then multiply by the > + * block size. > + */ > + offset = (offset+1) * 512;Is 512 always correct, or can a tar created with -b > 1 cause issues?> + > + nbdkit_debug ("tar: offset %" PRIu64 ", size %" PRIu64, offset, size); > + > + /* Check it looks sensible. XXX We ought to check it doesn't exceed > + * the size of the tar file. > + */ > + if (offset >= INT64_MAX || size >= INT64_MAX) { > + nbdkit_error ("internal error: calculated offset and size are wrong"); > + return -1; > + } > + > + return 0; > +} > + > +struct handle { > + int fd; > +}; > + > +static void * > +tar_open (int readonly) > +{ > + struct handle *h; > + > + assert (offset > 0); /* Cannot be zero because of tar header. */ > + > + h = calloc (1, sizeof *h); > + if (h == NULL) { > + nbdkit_error ("calloc: %m"); > + return NULL; > + } > + h->fd = open (tarfile, (readonly ? O_RDONLY : O_RDWR) | O_CLOEXEC);Should we really be opening a different fd per client, or can we get away with opening only one fd during .get_ready and sharing it among all clients? The unguarded use of O_CLOEXEC makes sense, but may cause compilation issues on Haiku.> + if (h->fd == -1) { > + nbdkit_error ("%s: %m", tarfile); > + free (h); > + return NULL; > + } > + > + return h; > +} > + > +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL > + > +/* Get the file size. */ > +static int64_t > +tar_get_size (void *handle) > +{ > + return size; > +} > + > +/* Serves the same data over multiple connections. */ > +static int > +tar_can_multi_conn (void *handle) > +{ > + return 1; > +}Needs a tweak if we add exportname=on> + > +static int > +tar_can_cache (void *handle) > +{ > + /* Let nbdkit call pread to populate the file system cache. */ > + return NBDKIT_CACHE_EMULATE; > +} > + > +/* Read data from the file. */ > +static int > +tar_pread (void *handle, void *buf, uint32_t count, uint64_t offs) > +{ > + struct handle *h = handle; > + > + offs += offset; > + > + while (count > 0) { > + ssize_t r = pread (h->fd, buf, count, offs); > + if (r == -1) { > + nbdkit_error ("pread: %m"); > + return -1; > + } > + if (r == 0) { > + nbdkit_error ("pread: unexpected end of file"); > + return -1; > + } > + buf += r; > + count -= r; > + offs += r; > + } > + > + return 0; > +} > + > +/* Write data to the file. */ > +static int > +tar_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offs) > +{ > + struct handle *h = handle; > + > + offs += offset; > + > + while (count > 0) { > + ssize_t r = pwrite (h->fd, buf, count, offs);Does this always work even when the tar file was created with sparse file support? For that matter, if the tar file was created with sparse file contents, are we even guaranteed that a contiguous offset of the tar file is blindly usable as the data to serve?> + if (r == -1) { > + nbdkit_error ("pwrite: %m"); > + return -1; > + } > + buf += r; > + count -= r; > + offs += r; > + } > + > + return 0; > +} > + > +static struct nbdkit_plugin plugin = { > + .name = "tar", > + .longname = "nbdkit tar plugin", > + .version = PACKAGE_VERSION, > + .unload = tar_unload, > + .config = tar_config, > + .config_complete = tar_config_complete, > + .config_help = tar_config_help, > + .magic_config_key = "tar", > + .get_ready = tar_get_ready, > + .open = tar_open, > + .get_size = tar_get_size, > + .can_multi_conn = tar_can_multi_conn, > + .can_cache = tar_can_cache, > + .pread = tar_pread, > + .pwrite = tar_pwrite, > + .errno_is_preserved = 1,No .extents makes sense if the tar file does not record sparseness, but may be needed if we support sparse tar files. $ truncate --size=1M large $ echo 'hi' >> large $ tar cSf f.tar large $ ls -l large f.tar -rw-rw-r--. 1 eblake eblake 10240 Jul 6 13:57 f.tar -rw-rw-r--. 1 eblake eblake 1048579 Jul 6 13:48 large $ LANG=C tar --no-auto-compress -tRvf f.tar large block 0: -rw-rw-r-- eblake/eblake 1048579 2020-07-06 13:48 large block 2: ** Block of NULs ** Yep, we need to special-case sparse tar files :( -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Jul-06 20:44 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
On Mon, Jul 06, 2020 at 01:58:35PM -0500, Eric Blake wrote:> so if we like the idea, we'd have to allow the user to specify > mutually-exclusive config parameters: either file= to something > within the file, or exportname=on to allow the client to choose, > where we then validate that exactly one of those two options is > configured.Right, I think that would be the way to do it, to make it fully general, and to avoid any possibility that someone turns on the exportname feature by accident (which could be a security issue).> >+++ b/plugins/tar/Makefile.am > >@@ -1,5 +1,5 @@ > > # nbdkit > >-# Copyright (C) 2013-2020 Red Hat Inc. > >+# Copyright (C) 2018-2020 Red Hat Inc. > > Interesting change in dates.I think that was a copy/paste error effectively. I copied the Makefile from an existing C plugin, rather than attempting to rewrite the old Perl-based one.> >+static int > >+tar_config_complete (void) > >+{ > >+ if (tarfile == NULL || file == NULL) { > >+ nbdkit_error ("you must supply the tar=<TARFILE> and file=<FILENAME> " > >+ "parameters"); > >+ return -1; > >+ } > >+ > >+ return 0; > >+} > >+ > >+#define tar_config_help \ > >+ "[tar=]<TARBALL> (required) The name of the tar file.\n" \ > >+ "file=<FILENAME> The path inside the tar file to server." > > Should '(required)' be listed on both lines? (Not necessarily on > the second, if we go with the exportname=on option)Yes, that's a bug (until exportname is implemented) - will fix soon.> >+ > >+static int > >+tar_get_ready (void) > >+{ > >+ FILE *fp; > >+ CLEANUP_FREE char *cmd = NULL; > >+ size_t len = 0; > >+ bool scanned_ok; > >+ char s[256]; > >+ > >+ /* Construct the tar command to examine the tar file. */ > >+ fp = open_memstream (&cmd, &len); > >+ if (fp == NULL) { > >+ nbdkit_error ("open_memstream: %m"); > >+ return -1; > >+ } > >+ fprintf (fp, "LANG=C tar --no-auto-compress -tRvf "); > > Use of --no-auto-compress is specific to GNU tar, do we care?I copied that from the existing Perl plugin, but note that -R is also not supported by non-GNU tar.> Should we allow a 'tar=/path/to/tar' parameter during .config to > allow a user to point to an alternative tar?Yes, this is a potential enhancement, especially where GNU tar might be called "gtar" (as it is on FreeBSD).> >+ shell_quote (tarfile, fp); > >+ fputc (' ', fp); > >+ shell_quote (file, fp); > >+ if (fclose (fp) == EOF) { > >+ nbdkit_error ("memstream failed: %m"); > >+ return -1; > >+ } > >+ > >+ /* Run the command and read the first line of the output. */ > >+ nbdkit_debug ("%s", cmd); > >+ fp = popen (cmd, "r"); > >+ if (fp == NULL) { > >+ nbdkit_error ("tar: %m"); > >+ return -1; > >+ } > >+ scanned_ok = fscanf (fp, "block %" SCNu64 ": %*s %*s %" SCNu64, > >+ &offset, &size) == 2; > > fscanf() parsing an integer is subject to undefined behavior on > overflow. (Yes, I know it is a pre-existing and prevalent issue...) > > >+ /* We have to now read and discard the rest of the output until EOF. */ > >+ while (fread (s, sizeof s, 1, fp) > 0) > >+ ; > >+ if (pclose (fp) != 0) { > >+ nbdkit_error ("tar subcommand failed, " > >+ "check that the file really exists in the tarball"); > >+ return -1; > >+ } > > If we add exportname support, we'll have to defer checking for file > existence until during .open. > > >+ > >+ if (!scanned_ok) { > >+ nbdkit_error ("unexpected output from the tar subcommand"); > >+ return -1; > >+ } > >+ > >+ /* Adjust the offset: Add 1 for the tar header, then multiply by the > >+ * block size. > >+ */ > >+ offset = (offset+1) * 512; > > Is 512 always correct, or can a tar created with -b > 1 cause issues?Nir actually raised this issue before and I convinced myself at the time that the block size is always 512 bytes. I didn't notice the -b option til now however, but I don't know if it affects this block size of not, but I'd say it's vanishingly unlikely that there exist any OVA files in the wild that use this feature (and if they do, even less likely that anything can read them).> >+ h->fd = open (tarfile, (readonly ? O_RDONLY : O_RDWR) | O_CLOEXEC); > > Should we really be opening a different fd per client, or can we get > away with opening only one fd during .get_ready and sharing it among > all clients?The original code actually opened the file once, but it had the disadvantage that you had to always open the file for writes (because you don't know if the client will request read-only or not). But that breaks if the tar file itself cannot be opened for writing. So I believe it is necessary to open it per handle, unless you can think of another way to get around that..> The unguarded use of O_CLOEXEC makes sense, but may cause > compilation issues on Haiku.I'll check this.> >+/* Write data to the file. */ > >+static int > >+tar_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offs) > >+{ > >+ struct handle *h = handle; > >+ > >+ offs += offset; > >+ > >+ while (count > 0) { > >+ ssize_t r = pwrite (h->fd, buf, count, offs); > > Does this always work even when the tar file was created with sparse > file support? For that matter, if the tar file was created with > sparse file contents, are we even guaranteed that a contiguous > offset of the tar file is blindly usable as the data to serve?We've used the same technique on thousands of OVA files without hitting any problem so far, so I guess that ...> $ truncate --size=1M large > $ echo 'hi' >> large > $ tar cSf f.tar large > $ ls -l large f.tar > -rw-rw-r--. 1 eblake eblake 10240 Jul 6 13:57 f.tar > -rw-rw-r--. 1 eblake eblake 1048579 Jul 6 13:48 large > $ LANG=C tar --no-auto-compress -tRvf f.tar large > block 0: -rw-rw-r-- eblake/eblake 1048579 2020-07-06 13:48 large > block 2: ** Block of NULs ** > > Yep, we need to special-case sparse tar files :(.. OVAs don't use this feature. I wonder if it's a GNU feature? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Eric Blake
2020-Jul-06 21:44 UTC
Re: [Libguestfs] [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
On 7/6/20 3:44 PM, Richard W.M. Jones wrote:>> >> Does this always work even when the tar file was created with sparse >> file support? For that matter, if the tar file was created with >> sparse file contents, are we even guaranteed that a contiguous >> offset of the tar file is blindly usable as the data to serve? > > We've used the same technique on thousands of OVA files without > hitting any problem so far, so I guess that ... > >> $ truncate --size=1M large >> $ echo 'hi' >> large >> $ tar cSf f.tar large >> $ ls -l large f.tar >> -rw-rw-r--. 1 eblake eblake 10240 Jul 6 13:57 f.tar >> -rw-rw-r--. 1 eblake eblake 1048579 Jul 6 13:48 large >> $ LANG=C tar --no-auto-compress -tRvf f.tar large >> block 0: -rw-rw-r-- eblake/eblake 1048579 2020-07-06 13:48 large >> block 2: ** Block of NULs ** >> >> Yep, we need to special-case sparse tar files :( > > .. OVAs don't use this feature. I wonder if it's a GNU feature?-S exists in both gtar and in star (Joerg Schilly's tar [1]; Joerg often complains on the GNU tar mailing list that his 'star' is more powerful than GNU tar), but sparse support is indeed missing in other tar implementations (technically, POSIX only specifies 'pax', not 'tar' [2], although the underlying data format is supposed to be compatible enough, and pax lacks -S...). [1] https://docs.oracle.com/cd/E71197_01/SAMCR/star.1m.htm [2] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Seemingly Similar Threads
- Re: [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- Re: [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- [PATCH nbdkit 2/2] tar: Rewrite the tar plugin (again), this time in C.
- [PATCH nbdkit] New filter: tar.
- [PATCH nbdkit 0/2] tar: Rewrite the tar plugin (again), this time in C.