Richard W.M. Jones
2018-Nov-21 15:46 UTC
[Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
Matt asked if xz should really be a filter rather than a plugin. The answer is yes, of course it should be! That's been something in the todo file for a while. The commit converts the xz plugin code into a filter (leaving the plugin around, but deprecating it). plugin: nbdkit xz file.xz filter: nbdkit --filter=xz file file.xz plugin: # can't be done filter: nbdkit --filter=xz curl url=https://example.com/disk.xz This is only lightly tested but it works for local files and for the curl example given in the commit message. Unfortunately because of the very large block size used in the Fedora cloud image, the curl example is barely usable. We should get them to use a more reasonable block size such as 16M (currently 192M). Rich.
Richard W.M. Jones
2018-Nov-21 15:46 UTC
[Libguestfs] [PATCH nbdkit 1/2] Rewrite xz plugin as a filter.
The xz plugin should really be a filter so that it can be applied on top of other sources, especially curl: nbdkit --filter=xz curl url=https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.raw.xz This repurposes the xz test to test only the filter. The plugin is no longer tested as it will be deprecated in the next commit. --- TODO | 2 - configure.ac | 8 +- filters/xz/Makefile.am | 71 +++++ filters/xz/blkcache.c | 146 ++++++++++ filters/xz/blkcache.h | 50 ++++ filters/xz/nbdkit-xz-filter.pod | 91 ++++++ filters/xz/xz.c | 230 +++++++++++++++ filters/xz/xzfile.c | 491 ++++++++++++++++++++++++++++++++ filters/xz/xzfile.h | 71 +++++ tests/Makefile.am | 38 +-- tests/test-xz.c | 4 +- 11 files changed, 1176 insertions(+), 26 deletions(-) diff --git a/TODO b/TODO index f732780..d37f594 100644 --- a/TODO +++ b/TODO @@ -90,8 +90,6 @@ Suggestions for filters * tar plugin should really be a filter -* xz plugin should really be a filter - * gzip plugin should really be a filter * masking plugin features for testing clients (see 'nozero' and 'fua' diff --git a/configure.ac b/configure.ac index b355915..1325767 100644 --- a/configure.ac +++ b/configure.ac @@ -599,10 +599,10 @@ AS_IF([test "$with_zlib" != "no"],[ ]) AM_CONDITIONAL([HAVE_ZLIB],[test "x$ZLIB_LIBS" != "x"]) -dnl Check for liblzma (only if you want to compile the xz plugin). +dnl Check for liblzma (only if you want to compile the xz filter). AC_ARG_WITH([liblzma], [AS_HELP_STRING([--without-liblzma], - [disable xz plugin @<:@default=check@:>@])], + [disable xz filter @<:@default=check@:>@])], [], [with_liblzma=check]) AS_IF([test "$with_liblzma" != "no"],[ @@ -611,7 +611,7 @@ AS_IF([test "$with_liblzma" != "no"],[ AC_SUBST([LIBLZMA_LIBS]) AC_DEFINE([HAVE_LIBLZMA],[1],[liblzma found at compile time.]) ], - [AC_MSG_WARN([liblzma not found, xz plugin will be disabled])]) + [AC_MSG_WARN([liblzma not found, xz filter will be disabled])]) ]) AM_CONDITIONAL([HAVE_LIBLZMA],[test "x$LIBLZMA_LIBS" != "x"]) @@ -728,6 +728,7 @@ filters="\ offset \ partition \ truncate \ + xz \ " AC_SUBST([plugins]) AC_SUBST([lang_plugins]) @@ -790,6 +791,7 @@ AC_CONFIG_FILES([Makefile filters/offset/Makefile filters/partition/Makefile filters/truncate/Makefile + filters/xz/Makefile src/Makefile src/nbdkit.pc tests/functions.sh diff --git a/filters/xz/Makefile.am b/filters/xz/Makefile.am new file mode 100644 index 0000000..4b4edc7 --- /dev/null +++ b/filters/xz/Makefile.am @@ -0,0 +1,71 @@ +# nbdkit +# Copyright (C) 2013-2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +include $(top_srcdir)/common-rules.mk + +EXTRA_DIST = nbdkit-xz-filter.pod + +if HAVE_LIBLZMA + +filter_LTLIBRARIES = nbdkit-xz-filter.la + +nbdkit_xz_filter_la_SOURCES = \ + blkcache.c \ + blkcache.h \ + xz.c \ + xzfile.c \ + xzfile.h \ + $(top_srcdir)/include/nbdkit-filter.h + +nbdkit_xz_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include +nbdkit_xz_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) \ + $(LIBLZMA_CFLAGS) +nbdkit_xz_filter_la_LIBADD = \ + $(LIBLZMA_LIBS) +nbdkit_xz_filter_la_LDFLAGS = \ + -module -avoid-version -shared + +if HAVE_POD + +man_MANS = nbdkit-xz-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-xz-filter.1: nbdkit-xz-filter.pod + $(PODWRAPPER) --section=1 --man $@ \ + --html $(top_builddir)/html/$@.html \ + $< + +endif HAVE_POD + +endif diff --git a/filters/xz/blkcache.c b/filters/xz/blkcache.c new file mode 100644 index 0000000..65c8a03 --- /dev/null +++ b/filters/xz/blkcache.c @@ -0,0 +1,146 @@ +/* nbdkit + * Copyright (C) 2013 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <inttypes.h> + +#include <nbdkit-plugin.h> + +#include "blkcache.h" + +/* Implemented as a very simple LRU list with a fixed depth. */ +struct blkcache { + size_t maxdepth; + struct block *blocks; + blkcache_stats stats; +}; + +struct block { + uint64_t start; + uint64_t size; + char *data; +}; + +blkcache * +new_blkcache (size_t maxdepth) +{ + blkcache *c; + + c = malloc (sizeof *c); + if (!c) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + c->blocks = calloc (maxdepth, sizeof (struct block)); + if (!c->blocks) { + nbdkit_error ("calloc: %m"); + free (c); + return NULL; + } + c->maxdepth = maxdepth; + c->stats.hits = c->stats.misses = 0; + + return c; +} + +void +free_blkcache (blkcache *c) +{ + size_t i; + + for (i = 0; i < c->maxdepth; ++i) + free (c->blocks[i].data); + free (c->blocks); + free (c); +} + +char * +get_block (blkcache *c, uint64_t offset, uint64_t *start, uint64_t *size) +{ + size_t i; + struct block tmp; + + for (i = 0; i < c->maxdepth; ++i) { + if (c->blocks[i].data != NULL && + c->blocks[i].start <= offset && + offset < c->blocks[i].start + c->blocks[i].size) { + /* This block is now most recently used, so put it at the start. */ + if (i > 0) { + tmp = c->blocks[0]; + c->blocks[0] = c->blocks[i]; + c->blocks[i] = tmp; + } + + c->stats.hits++; + *start = c->blocks[0].start; + *size = c->blocks[0].size; + return c->blocks[0].data; + } + } + + c->stats.misses++; + return NULL; +} + +int +put_block (blkcache *c, uint64_t start, uint64_t size, char *data) +{ + size_t i; + + /* Eject the least recently used block. */ + i = c->maxdepth-1; + if (c->blocks[i].data != NULL) + free (c->blocks[i].data); + + for (; i >= 1; --i) + c->blocks[i] = c->blocks[i-1]; + + /* The new block is most recently used, so it goes at the start. */ + c->blocks[0].start = start; + c->blocks[0].size = size; + c->blocks[0].data = data; + + return 0; +} + +void +blkcache_get_stats (blkcache *c, blkcache_stats *ret) +{ + memcpy (ret, &c->stats, sizeof (c->stats)); +} diff --git a/filters/xz/blkcache.h b/filters/xz/blkcache.h new file mode 100644 index 0000000..e68f5d1 --- /dev/null +++ b/filters/xz/blkcache.h @@ -0,0 +1,50 @@ +/* nbdkit + * Copyright (C) 2013 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef NBDKIT_BLKCACHE_H +#define NBDKIT_BLKCACHE_H + +typedef struct blkcache blkcache; + +typedef struct blkcache_stats { + size_t hits; + size_t misses; +} blkcache_stats; + +extern blkcache *new_blkcache (size_t maxdepth); +extern void free_blkcache (blkcache *); +extern char *get_block (blkcache *, uint64_t offset, uint64_t *start, uint64_t *size); +extern int put_block (blkcache *, uint64_t start, uint64_t size, char *data); +extern void blkcache_get_stats (blkcache *, blkcache_stats *ret); + +#endif /* NBDKIT_XZFILE_H */ diff --git a/filters/xz/nbdkit-xz-filter.pod b/filters/xz/nbdkit-xz-filter.pod new file mode 100644 index 0000000..e595388 --- /dev/null +++ b/filters/xz/nbdkit-xz-filter.pod @@ -0,0 +1,91 @@ +=head1 NAME + +nbdkit-xz-filter - nbdkit xz filter + +=head1 SYNOPSIS + + nbdkit --filter=xz file FILENAME.xz + + nbdkit --filter=xz curl url=https://example.com/FILENAME.xz + +=head1 DESCRIPTION + +C<nbdkit-xz-filter> is a filter for L<nbdkit(1)> which uncompresses +the underlying plugin on the fly. The filter only supports read-only +connections. + +=head2 Getting best random access performance from xz + +L<xz(1)> files are split into streams and blocks. Most xz files +contain only one stream which contains one or more blocks. You can +find out how many streams and blocks are in an xz file by doing: + + $ xz --list winxp.img.xz + Strms Blocks Compressed Uncompressed Ratio Check Filename + 1 1 2,100.0 MiB 6,144.0 MiB 0.342 CRC64 winxp.img.xz + ↑↑↑ ↑↑↑ + streams blocks + +xz files are seekable on block boundaries only. Seeking is done by +seeking directly to the lower block boundary, then uncompressing data +until the precise byte is reached. As uncompressing xz data is slow, +B<to get best random access performance, you must prepare your xz +files with many small blocks.> + +Use the I<--block-size> option with a small-ish block size. For +example this is the same image compressed with a 16 MB block size: + + $ xz --best --block-size=16777216 winxp.img + ───────────────────── + $ xz --list winxp.img.xz + Strms Blocks Compressed Uncompressed Ratio Check Filename + 1 384 2,120.1 MiB 6,144.0 MiB 0.345 CRC64 winxp.img.xz + ↑↑↑ + blocks + +This file can be accessed randomly. At most 16 MB will have to be +uncompressed to seek to any byte. + +As you would expect, xz cannot compress as efficiently when using a +smaller block size. The space penalty in the above example is +S<E<lt> 1%> of the compressed file size. + +=head1 PARAMETERS + +=over 4 + +=item B<xz-max-block=>SIZE + +The maximum block size that the filter will read. The filter will +refuse to read xz files that contain any block larger than this size. + +See the discussion above about creating xz files with small block +sizes in order to reduce memory usage and increase performance. + +This parameter is optional. If not specified it defaults to 512M. + +=item B<xz-max-depth=>N + +Maximum number of blocks stored in the LRU block cache. + +This parameter is optional. If not specified it defaults to 8. + +The filter may allocate up to +S<maximum block size in file * maxdepth> +bytes of memory I<per connection>. + +=back + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-filter(3)>, +L<xz(1)>. + +=head1 AUTHORS + +Richard W.M. Jones + +=head1 COPYRIGHT + +Copyright (C) 2013-2018 Red Hat Inc. diff --git a/filters/xz/xz.c b/filters/xz/xz.c new file mode 100644 index 0000000..5610187 --- /dev/null +++ b/filters/xz/xz.c @@ -0,0 +1,230 @@ +/* nbdkit + * Copyright (C) 2013-2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdarg.h> +#include <stdint.h> +#include <inttypes.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> + +#include <lzma.h> + +#include <nbdkit-filter.h> + +#include "xzfile.h" +#include "blkcache.h" + +static uint64_t maxblock = 512 * 1024 * 1024; +static size_t maxdepth = 8; + +static int +xz_config (nbdkit_next_config *next, void *nxdata, + const char *key, const char *value) +{ + if (strcmp (key, "xz-max-block") == 0) { + int64_t r = nbdkit_parse_size (value); + if (r == -1) + return -1; + maxblock = (uint64_t) r; + return 0; + } + else if (strcmp (key, "xz-max-depth") == 0) { + size_t r; + + if (sscanf (value, "%zu", &r) != 1) { + nbdkit_error ("could not parse 'xz-max-depth' parameter"); + return -1; + } + if (r == 0) { + nbdkit_error ("'xz-max-depth' parameter must be >= 1"); + return -1; + } + + maxdepth = r; + return 0; + } + else + return next (nxdata, key, value); +} + +#define xz_config_help \ + "xz-max-block=<SIZE> (optional) Maximum block size allowed (default: 512M)\n"\ + "xz-max-depth=<N> (optional) Maximum blocks in cache (default: 4)\n" + +/* The per-connection handle. */ +struct xz_handle { + xzfile *xz; + + /* Block cache. */ + blkcache *c; +}; + +/* Create the per-connection handle. */ +static void * +xz_open (nbdkit_next_open *next, void *nxdata, int readonly) +{ + struct xz_handle *h; + + /* Always pass readonly=1 to the underlying plugin. */ + if (next (nxdata, 1) == -1) + return NULL; + + h = malloc (sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + h->c = new_blkcache (maxdepth); + if (!h->c) { + free (h); + return NULL; + } + + /* Initialized in xz_prepare. */ + h->xz = NULL; + + return h; +} + +/* Free up the per-connection handle. */ +static void +xz_close (void *handle) +{ + struct xz_handle *h = handle; + blkcache_stats stats; + + if (h->c) { + blkcache_get_stats (h->c, &stats); + + nbdkit_debug ("cache: hits = %zu, misses = %zu", stats.hits, stats.misses); + } + + xzfile_close (h->xz); + free_blkcache (h->c); + free (h); +} + +static int +xz_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct xz_handle *h = handle; + + h->xz = xzfile_open (next_ops, nxdata); + if (!h->xz) + return -1; + + if (maxblock < xzfile_max_uncompressed_block_size (h->xz)) { + nbdkit_error ("xz file largest block is bigger than maxblock\n" + "Either recompress the xz file with smaller blocks (see nbdkit-xz-plugin(1))\n" + "or make maxblock parameter bigger.\n" + "maxblock = %" PRIu64 " (bytes)\n" + "largest block in xz file = %" PRIu64 " (bytes)", + maxblock, + xzfile_max_uncompressed_block_size (h->xz)); + return -1; + } + + return 0; +} + +/* Get the file size. */ +static int64_t +xz_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct xz_handle *h = handle; + + return xzfile_get_size (h->xz); +} + +/* Read data from the file. */ +static int +xz_pread (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + struct xz_handle *h = handle; + char *data; + uint64_t start, size; + uint32_t n; + + /* Find the block in the cache. */ + data = get_block (h->c, offset, &start, &size); + if (!data) { + /* Not in the cache. We need to read the block from the xz file. */ + data = xzfile_read_block (h->xz, next_ops, nxdata, flags, err, + offset, &start, &size); + if (data == NULL) + return -1; + put_block (h->c, start, size, data); + } + + /* It's possible if the blocks are really small or oddly aligned or + * if the requests are large that we need to read the following + * block to satisfy the request. + */ + n = count; + if (start + size - offset < n) + n = start + size - offset; + + memcpy (buf, &data[offset-start], n); + buf += n; + count -= n; + offset += n; + if (count > 0) + return xz_pread (next_ops, nxdata, h, buf, count, offset, flags, err); + + return 0; +} + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS + +static struct nbdkit_filter filter = { + .name = "xz", + .longname = "nbdkit XZ filter", + .version = PACKAGE_VERSION, + .config = xz_config, + .config_help = xz_config_help, + .open = xz_open, + .close = xz_close, + .prepare = xz_prepare, + .get_size = xz_get_size, + .pread = xz_pread, +}; + +NBDKIT_REGISTER_FILTER(filter) diff --git a/filters/xz/xzfile.c b/filters/xz/xzfile.c new file mode 100644 index 0000000..aaaa475 --- /dev/null +++ b/filters/xz/xzfile.c @@ -0,0 +1,491 @@ +/* nbdkit + * Copyright (C) 2013 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* liblzma is a complex interface, so abstract it here. */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <stdint.h> +#include <inttypes.h> +#include <unistd.h> +#include <fcntl.h> +#include <sys/types.h> + +#include <nbdkit-filter.h> + +#include <lzma.h> + +#include "xzfile.h" + +#ifndef O_CLOEXEC +#define O_CLOEXEC 0 +#endif + +#define XZ_HEADER_MAGIC "\xfd" "7zXZ\0" +#define XZ_HEADER_MAGIC_LEN 6 +#define XZ_FOOTER_MAGIC "YZ" +#define XZ_FOOTER_MAGIC_LEN 2 + +struct xzfile { + lzma_index *idx; + size_t nr_streams; + size_t nr_blocks; + uint64_t max_uncompressed_block_size; +}; + +static bool check_header_magic (struct nbdkit_next_ops *next_ops, void *nxdata); +static lzma_index *parse_indexes (struct nbdkit_next_ops *next_ops, void *nxdata, size_t *); +static int iter_indexes (lzma_index *idx, size_t *, uint64_t *); + +xzfile * +xzfile_open (struct nbdkit_next_ops *next_ops, void *nxdata) +{ + xzfile *xz; + uint64_t size; + + xz = malloc (sizeof *xz); + if (xz == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + /* Check file magic. */ + if (!check_header_magic (next_ops, nxdata)) { + nbdkit_error ("xz: not an xz file"); + goto err1; + } + + /* Read and parse the indexes. */ + xz->idx = parse_indexes (next_ops, nxdata, &xz->nr_streams); + if (xz->idx == NULL) + goto err1; + + /* Iterate over indexes to find the number of and largest block. */ + if (iter_indexes (xz->idx, + &xz->nr_blocks, &xz->max_uncompressed_block_size) == -1) + goto err1; + + size = lzma_index_uncompressed_size (xz->idx); + nbdkit_debug ("xz: size %" PRIu64 " bytes (%.1fM)", + size, size / 1024.0 / 1024.0); + nbdkit_debug ("xz: %zu streams, %zu blocks", xz->nr_streams, xz->nr_blocks); + nbdkit_debug ("xz: maximum uncompressed block size %" PRIu64 " bytes (%.1fM)", + xz->max_uncompressed_block_size, + xz->max_uncompressed_block_size / 1024.0 / 1024.0); + + return xz; + + err1: + free (xz); + return NULL; +} + +static bool +check_header_magic (struct nbdkit_next_ops *next_ops, void *nxdata) +{ + char buf[XZ_HEADER_MAGIC_LEN]; + int err; + + if (next_ops->pread (nxdata, buf, XZ_HEADER_MAGIC_LEN, 0, 0, &err) == -1) { + nbdkit_error ("xz: could not read header magic: error %d", err); + return false; + } + if (memcmp (buf, XZ_HEADER_MAGIC, XZ_HEADER_MAGIC_LEN) != 0) + return false; + return true; +} + +/* For explanation of this function, see src/xz/list.c:parse_indexes + * in the xz sources. + */ +static lzma_index * +parse_indexes (struct nbdkit_next_ops *next_ops, void *nxdata, + size_t *nr_streams) +{ + lzma_ret r; + int64_t size, pos, index_size; + int err; + uint8_t footer[LZMA_STREAM_HEADER_SIZE]; + uint8_t header[LZMA_STREAM_HEADER_SIZE]; + lzma_stream_flags footer_flags; + lzma_stream_flags header_flags; + lzma_stream strm = LZMA_STREAM_INIT; + lzma_index *combined_index = NULL; + lzma_index *this_index = NULL; + lzma_vli stream_padding = 0; + + *nr_streams = 0; + + /* Check file size is a multiple of 4 bytes. */ + pos = size = next_ops->get_size (nxdata); + if (pos == -1) { + nbdkit_error ("xz: get_size: %m"); + goto err; + } + if ((pos & 3) != 0) { + nbdkit_error ("xz: not an xz file: size is not a multiple of 4 bytes"); + goto err; + } + + /* Jump backwards through the file identifying each stream. */ + while (pos > 0) { + nbdkit_debug ("looping through streams: pos = %" PRIi64, pos); + + if (pos < LZMA_STREAM_HEADER_SIZE) { + nbdkit_error ("xz: corrupted file at %" PRIi64, pos); + goto err; + } + + if (next_ops->pread (nxdata, footer, LZMA_STREAM_HEADER_SIZE, + pos - LZMA_STREAM_HEADER_SIZE, + 0, &err) == -1) { + nbdkit_error ("xz: read stream footer: error %d", err); + goto err; + } + /* Skip stream padding. */ + if (footer[8] == 0 && footer[9] == 0 && + footer[10] == 0 && footer[11] == 0) { + stream_padding += 4; + pos -= 4; + continue; + } + + pos -= LZMA_STREAM_HEADER_SIZE; + (*nr_streams)++; + + nbdkit_debug ("decode stream footer at pos = %" PRIi64, pos); + + /* Does the stream footer look reasonable? */ + r = lzma_stream_footer_decode (&footer_flags, footer); + if (r != LZMA_OK) { + nbdkit_error ("xz: invalid stream footer (error %d)", r); + goto err; + } + nbdkit_debug ("backward_size = %" PRIu64, + (uint64_t) footer_flags.backward_size); + index_size = footer_flags.backward_size; + if (pos < index_size + LZMA_STREAM_HEADER_SIZE) { + nbdkit_error ("xz: invalid stream footer"); + goto err; + } + + pos -= index_size; + nbdkit_debug ("decode index at pos = %" PRIi64, pos); + + /* Decode the index. */ + r = lzma_index_decoder (&strm, &this_index, UINT64_MAX); + if (r != LZMA_OK) { + nbdkit_error ("xz: invalid stream index (error %d)", r); + goto err; + } + + do { + uint8_t buf[BUFSIZ]; + + strm.avail_in = index_size; + if (strm.avail_in > BUFSIZ) + strm.avail_in = BUFSIZ; + if (pos + strm.avail_in > size) + strm.avail_in = size - pos; + + if (next_ops->pread (nxdata, buf, strm.avail_in, pos, 0, &err) == -1) { + nbdkit_error ("xz: read index: error %d", err); + goto err; + } + index_size -= strm.avail_in; + + strm.next_in = buf; + r = lzma_code (&strm, LZMA_RUN); + } while (r == LZMA_OK); + + if (r != LZMA_STREAM_END) { + nbdkit_error ("xz: could not parse index (error %d)", r); + goto err; + } + + pos -= lzma_index_total_size (this_index) + LZMA_STREAM_HEADER_SIZE; + + nbdkit_debug ("decode stream header at pos = %" PRIi64, pos); + + /* Read and decode the stream header. */ + if (next_ops->pread (nxdata, header, LZMA_STREAM_HEADER_SIZE, pos, 0, + &err) == -1) { + nbdkit_error ("xz: read stream header: error %d", err); + goto err; + } + + r = lzma_stream_header_decode (&header_flags, header); + if (r != LZMA_OK) { + nbdkit_error ("xz: invalid stream header (error %d)", r); + goto err; + } + + /* Header and footer of the stream should be equal. */ + r = lzma_stream_flags_compare (&header_flags, &footer_flags); + if (r != LZMA_OK) { + nbdkit_error ("xz: header and footer of stream are not equal (error %d)", + r); + goto err; + } + + /* Store the decoded stream flags in this_index. */ + r = lzma_index_stream_flags (this_index, &footer_flags); + if (r != LZMA_OK) { + nbdkit_error ("xz: cannot read stream_flags from index (error %d)", r); + goto err; + } + + /* Store the amount of stream padding so far. Needed to calculate + * compressed offsets correctly in multi-stream files. + */ + r = lzma_index_stream_padding (this_index, stream_padding); + if (r != LZMA_OK) { + nbdkit_error ("xz: cannot set stream_padding in index (error %d)", r); + goto err; + } + + if (combined_index != NULL) { + r = lzma_index_cat (this_index, combined_index, NULL); + if (r != LZMA_OK) { + nbdkit_error ("xz: cannot combine indexes"); + goto err; + } + } + + combined_index = this_index; + this_index = NULL; + } + + lzma_end (&strm); + + return combined_index; + + err: + lzma_end (&strm); + lzma_index_end (this_index, NULL); + lzma_index_end (combined_index, NULL); + return NULL; +} + +/* Iterate over the indexes to find the number of blocks and + * the largest block. + */ +static int +iter_indexes (lzma_index *idx, + size_t *nr_blocks, uint64_t *max_uncompressed_block_size) +{ + lzma_index_iter iter; + + *nr_blocks = 0; + *max_uncompressed_block_size = 0; + + lzma_index_iter_init (&iter, idx); + while (!lzma_index_iter_next (&iter, LZMA_INDEX_ITER_NONEMPTY_BLOCK)) { + if (iter.block.uncompressed_size > *max_uncompressed_block_size) + *max_uncompressed_block_size = iter.block.uncompressed_size; + (*nr_blocks)++; + } + + return 0; +} + +void +xzfile_close (xzfile *xz) +{ + if (xz) { + lzma_index_end (xz->idx, NULL); + free (xz); + } +} + +uint64_t +xzfile_max_uncompressed_block_size (xzfile *xz) +{ + return xz->max_uncompressed_block_size; +} + +uint64_t +xzfile_get_size (xzfile *xz) +{ + return lzma_index_uncompressed_size (xz->idx); +} + +char * +xzfile_read_block (xzfile *xz, + struct nbdkit_next_ops *next_ops, + void *nxdata, uint32_t flags, int *err, + uint64_t offset, + uint64_t *start_rtn, uint64_t *size_rtn) +{ + int64_t offs, size; + lzma_index_iter iter; + uint8_t header[LZMA_BLOCK_HEADER_SIZE_MAX]; + lzma_block block; + lzma_filter filters[LZMA_FILTERS_MAX + 1]; + lzma_ret r; + lzma_stream strm = LZMA_STREAM_INIT; + char *data; + size_t i; + + /* Read the total size of the underlying disk, so we don't + * read over the end. + */ + size = next_ops->get_size (nxdata); + if (size == -1) { + nbdkit_error ("xz: get_size: %m"); + return NULL; + } + + /* Locate the block containing the uncompressed offset. */ + lzma_index_iter_init (&iter, xz->idx); + if (lzma_index_iter_locate (&iter, offset)) { + nbdkit_error ("cannot find offset %" PRIu64 " in the xz file", offset); + return NULL; + } + + *start_rtn = iter.block.uncompressed_file_offset; + *size_rtn = iter.block.uncompressed_size; + + nbdkit_debug ("seek: block number %d at file offset %" PRIu64, + (int) iter.block.number_in_file, + (uint64_t) iter.block.compressed_file_offset); + + /* Read the block header. Start by reading a single byte which + * tell us how big the block header is. + */ + offs = iter.block.compressed_file_offset; + if (next_ops->pread (nxdata, header, 1, offs, 0, err) == -1) { + nbdkit_error ("xz: read: could not read block header byte: error %d", *err); + return NULL; + } + offs++; + + if (header[0] == '\0') { + nbdkit_error ("xz: read: unexpected invalid block in file, header[0] = 0"); + return NULL; + } + + block.version = 0; + block.check = iter.stream.flags->check; + block.filters = filters; + block.header_size = lzma_block_header_size_decode (header[0]); + + /* Now read and decode the block header. */ + if (next_ops->pread (nxdata, &header[1], block.header_size-1, offs, + 0, err) == -1) { + nbdkit_error ("xz: read: could not read block of compressed data: " + "error %d", *err); + return NULL; + } + offs += block.header_size - 1; + + r = lzma_block_header_decode (&block, NULL, header); + if (r != LZMA_OK) { + nbdkit_error ("invalid block header (error %d)", r); + return NULL; + } + + /* What this actually does is it checks that the block header + * matches the index. + */ + r = lzma_block_compressed_size (&block, iter.block.unpadded_size); + if (r != LZMA_OK) { + nbdkit_error ("cannot calculate compressed size (error %d)", r); + goto err1; + } + + /* Read the block data. */ + r = lzma_block_decoder (&strm, &block); + if (r != LZMA_OK) { + nbdkit_error ("invalid block (error %d)", r); + goto err1; + } + + data = malloc (*size_rtn); + if (data == NULL) { + nbdkit_error ("malloc (%" PRIu64 " bytes): %m\n" + "NOTE: If this error occurs, you need to recompress your xz files with a smaller block size. Use: 'xz --block-size=16777216 ...'.", + *size_rtn); + goto err1; + } + + strm.next_in = NULL; + strm.avail_in = 0; + strm.next_out = (uint8_t *) data; + strm.avail_out = block.uncompressed_size; + + do { + uint8_t buf[BUFSIZ]; + + if (strm.avail_in == 0) { + strm.avail_in = BUFSIZ; + if (offs + strm.avail_in > size) + strm.avail_in = size - offs; + if (strm.avail_in > 0) { + strm.next_in = buf; + if (next_ops->pread (nxdata, buf, strm.avail_in, offs, 0, err) == -1) { + nbdkit_error ("xz: read: error %d", *err); + goto err2; + } + offs += sizeof buf; + } + } + + r = lzma_code (&strm, LZMA_RUN); + } while (r == LZMA_OK); + + if (r != LZMA_OK && r != LZMA_STREAM_END) { + nbdkit_error ("could not parse block data (error %d)", r); + goto err2; + } + + lzma_end (&strm); + + for (i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) + free (filters[i].options); + + return data; + + err2: + free (data); + lzma_end (&strm); + err1: + for (i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) + free (filters[i].options); + + return NULL; +} diff --git a/filters/xz/xzfile.h b/filters/xz/xzfile.h new file mode 100644 index 0000000..9c9ddbc --- /dev/null +++ b/filters/xz/xzfile.h @@ -0,0 +1,71 @@ +/* nbdkit + * Copyright (C) 2013-2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* liblzma is a complex interface, so abstract it here. */ + +#ifndef NBDKIT_XZFILE_H +#define NBDKIT_XZFILE_H + +#include <nbdkit-filter.h> + +typedef struct xzfile xzfile; + +/* Open (and verify) the named xz file. */ +extern xzfile *xzfile_open (struct nbdkit_next_ops *next_ops, void *nxdata); + +/* Close the file and free up all resources. */ +extern void xzfile_close (xzfile *); + +/* Get (uncompressed) size of the largest block in the file. */ +extern uint64_t xzfile_max_uncompressed_block_size (xzfile *); + +/* Get the total uncompressed size of the file. */ +extern uint64_t xzfile_get_size (xzfile *); + +/* Read the xz file block that contains the byte at 'offset' in the + * uncompressed file. + * + * The uncompressed block of data, which probably begins before the + * requested byte and ends after it, is returned. The caller must + * free it. NULL is returned if there was an error. + * + * The start offset & size of the block relative to the uncompressed + * file are returned in *start and *size. + */ +extern char *xzfile_read_block (xzfile *xz, + struct nbdkit_next_ops *next_ops, + void *nxdata, uint32_t flags, int *err, + uint64_t offset, + uint64_t *start, uint64_t *size); + +#endif /* NBDKIT_XZFILE_H */ diff --git a/tests/Makefile.am b/tests/Makefile.am index 959d6e6..0cd2366 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -472,25 +472,6 @@ libvixDiskLib_la_CXXFLAGS = \ libvixDiskLib_la_LDFLAGS = \ -shared -version-number 6:0:0 -rpath /nowhere -# xz plugin test. -if HAVE_LIBLZMA -if HAVE_GUESTFISH - -LIBGUESTFS_TESTS += test-xz -check_DATA += disk.xz -MAINTAINERCLEANFILES += disk.xz - -test_xz_SOURCES = test-xz.c test.h -test_xz_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) -test_xz_LDADD = libtest.la $(LIBGUESTFS_LIBS) - -disk.xz: disk - rm -f $@ - xz --best --block-size=16777216 -c disk > $@ - -endif HAVE_GUESTFISH -endif HAVE_LIBLZMA - # zero plugin test. TESTS += test-zero.sh @@ -746,4 +727,23 @@ TESTS += \ test-truncate2.sh \ test-truncate3.sh +# xz filter test. +if HAVE_LIBLZMA +if HAVE_GUESTFISH + +LIBGUESTFS_TESTS += test-xz +check_DATA += disk.xz +MAINTAINERCLEANFILES += disk.xz + +test_xz_SOURCES = test-xz.c test.h +test_xz_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) +test_xz_LDADD = libtest.la $(LIBGUESTFS_LIBS) + +disk.xz: disk + rm -f $@ + xz --best --block-size=16777216 -c disk > $@ + +endif HAVE_GUESTFISH +endif HAVE_LIBLZMA + endif HAVE_PLUGINS diff --git a/tests/test-xz.c b/tests/test-xz.c index 46b1d04..7248a96 100644 --- a/tests/test-xz.c +++ b/tests/test-xz.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013 Red Hat Inc. + * Copyright (C) 2013-2018 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -51,7 +51,7 @@ main (int argc, char *argv[]) int r; char *data; - if (test_start_nbdkit ("xz", "-r", "disk.xz", NULL) == -1) + if (test_start_nbdkit ("--filter=xz", "file", "disk.xz", NULL) == -1) exit (EXIT_FAILURE); g = guestfs_create (); -- 2.19.0.rc0
Richard W.M. Jones
2018-Nov-21 15:46 UTC
[Libguestfs] [PATCH nbdkit 2/2] xz: Deprecate xz plugin.
Plan to remove it in the next-by-one stable release. --- plugins/xz/nbdkit-xz-plugin.pod | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/plugins/xz/nbdkit-xz-plugin.pod b/plugins/xz/nbdkit-xz-plugin.pod index 2bd0a1c..acd7817 100644 --- a/plugins/xz/nbdkit-xz-plugin.pod +++ b/plugins/xz/nbdkit-xz-plugin.pod @@ -6,6 +6,15 @@ nbdkit-xz-plugin - nbdkit xz plugin nbdkit xz [file=]FILENAME.xz +=head1 DEPRECATED + +B<The xz plugin is deprecated in S<nbdkit E<ge> 1.9.2> and will be +removed in S<nbdkit 1.12>>. It has been replaced with a filter with +the same functionality, see L<nbdkit-xz-filter(1)>. You can use the +filter like this: + + nbdkit --filter=xz file file.xz + =head1 DESCRIPTION C<nbdkit-xz-plugin> is a file serving plugin for L<nbdkit(1)>. -- 2.19.0.rc0
Eric Blake
2018-Nov-21 15:59 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
On 11/21/18 9:46 AM, Richard W.M. Jones wrote:> Matt asked if xz should really be a filter rather than a plugin. The > answer is yes, of course it should be! That's been something in the > todo file for a while. > > The commit converts the xz plugin code into a filter (leaving the > plugin around, but deprecating it). > > plugin: nbdkit xz file.xz > filter: nbdkit --filter=xz file file.xz > > plugin: # can't be done > filter: nbdkit --filter=xz curl url=https://example.com/disk.xzAnd further: nbdkit --filter=cache --filter=xz curl url=... to take advantage of local caching rather than repeated curl requests.> > This is only lightly tested but it works for local files and for the > curl example given in the commit message. Unfortunately because of > the very large block size used in the Fedora cloud image, the curl > example is barely usable. We should get them to use a more reasonable > block size such as 16M (currently 192M).This may be the first real random-access of a remote xz file that makes the argument for a smaller block size :) Of course, when you switch to a smaller block size, the xz image can't compress quite as far, but hopefully the size difference is not that bad. Do you have actual numbers comparing the file size, vs. the speed changes made possible by the difference in block size? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2018-Nov-21 16:05 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
On Wed, Nov 21, 2018 at 09:59:51AM -0600, Eric Blake wrote:> On 11/21/18 9:46 AM, Richard W.M. Jones wrote: > >Matt asked if xz should really be a filter rather than a plugin. The > >answer is yes, of course it should be! That's been something in the > >todo file for a while. > > > >The commit converts the xz plugin code into a filter (leaving the > >plugin around, but deprecating it). > > > > plugin: nbdkit xz file.xz > > filter: nbdkit --filter=xz file file.xz > > > > plugin: # can't be done > > filter: nbdkit --filter=xz curl url=https://example.com/disk.xz > > And further: > > nbdkit --filter=cache --filter=xz curl url=... > > to take advantage of local caching rather than repeated curl requests.The xz plugin includes a block cache, and now so does the filter. (The plugin long predates our addition of filters into nbdkit). I suppose there is a case for removing the block cache code from the xz plugin, and relying on the cache filter instead. I'll test that out to see if it makes a difference. It will certainly simplify the xz filter code if we did that, but at the cost of making it a bit more complex to use.> >This is only lightly tested but it works for local files and for the > >curl example given in the commit message. Unfortunately because of > >the very large block size used in the Fedora cloud image, the curl > >example is barely usable. We should get them to use a more reasonable > >block size such as 16M (currently 192M). > > This may be the first real random-access of a remote xz file that > makes the argument for a smaller block size :) > > Of course, when you switch to a smaller block size, the xz image > can't compress quite as far, but hopefully the size difference is > not that bad. Do you have actual numbers comparing the file size, > vs. the speed changes made possible by the difference in block size?In fact yes I do. I measured < 1% overhead with a 16M block size (see nbdkit-xz-plugin man page for details). This is not particularly surprising since 16M is still pretty large. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Eric Blake
2018-Nov-21 16:11 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
On 11/21/18 9:59 AM, Eric Blake wrote:> On 11/21/18 9:46 AM, Richard W.M. Jones wrote: >> Matt asked if xz should really be a filter rather than a plugin. The >> answer is yes, of course it should be! That's been something in the >> todo file for a while. >> >> The commit converts the xz plugin code into a filter (leaving the >> plugin around, but deprecating it). >> >> plugin: nbdkit xz file.xz >> filter: nbdkit --filter=xz file file.xz >> >> plugin: # can't be done >> filter: nbdkit --filter=xz curl url=https://example.com/disk.xz > > And further: > > nbdkit --filter=cache --filter=xz curl url=... > > to take advantage of local caching rather than repeated curl requests.Actually, after reading patch 1, I see that you already do some caching directly in the xz filter (copied from the xz plugin), so sticking --filter=cache in front of --filter=xz probably won't buy much performance, but using: nbdkit --filter=xz --filter=cache curl url=... will have two layers of caching (one of the compressed xz data read from curl, another of the uncompressed data in the xz filter). Or, we could argue that the xz filter itself doesn't need to concern itself with caching (for less code in the xz filter), and just recommend that the user supply the --filter=cache themselves for performance (but then you wonder if we should start considering nbdkit being able to use the same filter more than once in its chain, which right now is not possible). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Reasonably Related Threads
- Re: [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- Re: [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- [PATCH nbdkit] tests: xz: Use 16M block size when preparing disk for xz plugin test.
- Re: [PATCH nbdkit] New filter: gzip