Richard W.M. Jones
2018-Nov-21 19:00 UTC
[Libguestfs] [PATCH nbdkit v2 0/3] Rewrite xz plugin as a filter.
v2: - Fixes a number of bugs in corner cases. - Uses a 1M block size to fetch from the underlying plugin. This improves performance considerably. I also tested this much more thoroughly and can't find any more bugs. Rich.
Richard W.M. Jones
2018-Nov-21 19:00 UTC
[Libguestfs] [PATCH nbdkit v2 1/3] Rewrite xz plugin as a filter.
The xz plugin should really be a filter so that it can be applied on top of other sources, especially curl: nbdkit --filter=xz curl url=https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.raw.xz This repurposes the xz test to test only the filter. The plugin is no longer tested as it will be deprecated in the next commit. --- TODO | 2 - configure.ac | 8 +- filters/xz/Makefile.am | 71 +++++ filters/xz/blkcache.c | 146 ++++++++++ filters/xz/blkcache.h | 50 ++++ filters/xz/nbdkit-xz-filter.pod | 91 ++++++ filters/xz/xz.c | 230 +++++++++++++++ filters/xz/xzfile.c | 502 ++++++++++++++++++++++++++++++++ filters/xz/xzfile.h | 71 +++++ tests/Makefile.am | 38 +-- tests/test-xz.c | 4 +- 11 files changed, 1187 insertions(+), 26 deletions(-) diff --git a/TODO b/TODO index f732780..d37f594 100644 --- a/TODO +++ b/TODO @@ -90,8 +90,6 @@ Suggestions for filters * tar plugin should really be a filter -* xz plugin should really be a filter - * gzip plugin should really be a filter * masking plugin features for testing clients (see 'nozero' and 'fua' diff --git a/configure.ac b/configure.ac index b355915..1325767 100644 --- a/configure.ac +++ b/configure.ac @@ -599,10 +599,10 @@ AS_IF([test "$with_zlib" != "no"],[ ]) AM_CONDITIONAL([HAVE_ZLIB],[test "x$ZLIB_LIBS" != "x"]) -dnl Check for liblzma (only if you want to compile the xz plugin). +dnl Check for liblzma (only if you want to compile the xz filter). AC_ARG_WITH([liblzma], [AS_HELP_STRING([--without-liblzma], - [disable xz plugin @<:@default=check@:>@])], + [disable xz filter @<:@default=check@:>@])], [], [with_liblzma=check]) AS_IF([test "$with_liblzma" != "no"],[ @@ -611,7 +611,7 @@ AS_IF([test "$with_liblzma" != "no"],[ AC_SUBST([LIBLZMA_LIBS]) AC_DEFINE([HAVE_LIBLZMA],[1],[liblzma found at compile time.]) ], - [AC_MSG_WARN([liblzma not found, xz plugin will be disabled])]) + [AC_MSG_WARN([liblzma not found, xz filter will be disabled])]) ]) AM_CONDITIONAL([HAVE_LIBLZMA],[test "x$LIBLZMA_LIBS" != "x"]) @@ -728,6 +728,7 @@ filters="\ offset \ partition \ truncate \ + xz \ " AC_SUBST([plugins]) AC_SUBST([lang_plugins]) @@ -790,6 +791,7 @@ AC_CONFIG_FILES([Makefile filters/offset/Makefile filters/partition/Makefile filters/truncate/Makefile + filters/xz/Makefile src/Makefile src/nbdkit.pc tests/functions.sh diff --git a/filters/xz/Makefile.am b/filters/xz/Makefile.am new file mode 100644 index 0000000..4b4edc7 --- /dev/null +++ b/filters/xz/Makefile.am @@ -0,0 +1,71 @@ +# nbdkit +# Copyright (C) 2013-2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +include $(top_srcdir)/common-rules.mk + +EXTRA_DIST = nbdkit-xz-filter.pod + +if HAVE_LIBLZMA + +filter_LTLIBRARIES = nbdkit-xz-filter.la + +nbdkit_xz_filter_la_SOURCES = \ + blkcache.c \ + blkcache.h \ + xz.c \ + xzfile.c \ + xzfile.h \ + $(top_srcdir)/include/nbdkit-filter.h + +nbdkit_xz_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include +nbdkit_xz_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) \ + $(LIBLZMA_CFLAGS) +nbdkit_xz_filter_la_LIBADD = \ + $(LIBLZMA_LIBS) +nbdkit_xz_filter_la_LDFLAGS = \ + -module -avoid-version -shared + +if HAVE_POD + +man_MANS = nbdkit-xz-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-xz-filter.1: nbdkit-xz-filter.pod + $(PODWRAPPER) --section=1 --man $@ \ + --html $(top_builddir)/html/$@.html \ + $< + +endif HAVE_POD + +endif diff --git a/filters/xz/blkcache.c b/filters/xz/blkcache.c new file mode 100644 index 0000000..65c8a03 --- /dev/null +++ b/filters/xz/blkcache.c @@ -0,0 +1,146 @@ +/* nbdkit + * Copyright (C) 2013 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdint.h> +#include <inttypes.h> + +#include <nbdkit-plugin.h> + +#include "blkcache.h" + +/* Implemented as a very simple LRU list with a fixed depth. */ +struct blkcache { + size_t maxdepth; + struct block *blocks; + blkcache_stats stats; +}; + +struct block { + uint64_t start; + uint64_t size; + char *data; +}; + +blkcache * +new_blkcache (size_t maxdepth) +{ + blkcache *c; + + c = malloc (sizeof *c); + if (!c) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + c->blocks = calloc (maxdepth, sizeof (struct block)); + if (!c->blocks) { + nbdkit_error ("calloc: %m"); + free (c); + return NULL; + } + c->maxdepth = maxdepth; + c->stats.hits = c->stats.misses = 0; + + return c; +} + +void +free_blkcache (blkcache *c) +{ + size_t i; + + for (i = 0; i < c->maxdepth; ++i) + free (c->blocks[i].data); + free (c->blocks); + free (c); +} + +char * +get_block (blkcache *c, uint64_t offset, uint64_t *start, uint64_t *size) +{ + size_t i; + struct block tmp; + + for (i = 0; i < c->maxdepth; ++i) { + if (c->blocks[i].data != NULL && + c->blocks[i].start <= offset && + offset < c->blocks[i].start + c->blocks[i].size) { + /* This block is now most recently used, so put it at the start. */ + if (i > 0) { + tmp = c->blocks[0]; + c->blocks[0] = c->blocks[i]; + c->blocks[i] = tmp; + } + + c->stats.hits++; + *start = c->blocks[0].start; + *size = c->blocks[0].size; + return c->blocks[0].data; + } + } + + c->stats.misses++; + return NULL; +} + +int +put_block (blkcache *c, uint64_t start, uint64_t size, char *data) +{ + size_t i; + + /* Eject the least recently used block. */ + i = c->maxdepth-1; + if (c->blocks[i].data != NULL) + free (c->blocks[i].data); + + for (; i >= 1; --i) + c->blocks[i] = c->blocks[i-1]; + + /* The new block is most recently used, so it goes at the start. */ + c->blocks[0].start = start; + c->blocks[0].size = size; + c->blocks[0].data = data; + + return 0; +} + +void +blkcache_get_stats (blkcache *c, blkcache_stats *ret) +{ + memcpy (ret, &c->stats, sizeof (c->stats)); +} diff --git a/filters/xz/blkcache.h b/filters/xz/blkcache.h new file mode 100644 index 0000000..e68f5d1 --- /dev/null +++ b/filters/xz/blkcache.h @@ -0,0 +1,50 @@ +/* nbdkit + * Copyright (C) 2013 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef NBDKIT_BLKCACHE_H +#define NBDKIT_BLKCACHE_H + +typedef struct blkcache blkcache; + +typedef struct blkcache_stats { + size_t hits; + size_t misses; +} blkcache_stats; + +extern blkcache *new_blkcache (size_t maxdepth); +extern void free_blkcache (blkcache *); +extern char *get_block (blkcache *, uint64_t offset, uint64_t *start, uint64_t *size); +extern int put_block (blkcache *, uint64_t start, uint64_t size, char *data); +extern void blkcache_get_stats (blkcache *, blkcache_stats *ret); + +#endif /* NBDKIT_XZFILE_H */ diff --git a/filters/xz/nbdkit-xz-filter.pod b/filters/xz/nbdkit-xz-filter.pod new file mode 100644 index 0000000..e595388 --- /dev/null +++ b/filters/xz/nbdkit-xz-filter.pod @@ -0,0 +1,91 @@ +=head1 NAME + +nbdkit-xz-filter - nbdkit xz filter + +=head1 SYNOPSIS + + nbdkit --filter=xz file FILENAME.xz + + nbdkit --filter=xz curl url=https://example.com/FILENAME.xz + +=head1 DESCRIPTION + +C<nbdkit-xz-filter> is a filter for L<nbdkit(1)> which uncompresses +the underlying plugin on the fly. The filter only supports read-only +connections. + +=head2 Getting best random access performance from xz + +L<xz(1)> files are split into streams and blocks. Most xz files +contain only one stream which contains one or more blocks. You can +find out how many streams and blocks are in an xz file by doing: + + $ xz --list winxp.img.xz + Strms Blocks Compressed Uncompressed Ratio Check Filename + 1 1 2,100.0 MiB 6,144.0 MiB 0.342 CRC64 winxp.img.xz + ↑↑↑ ↑↑↑ + streams blocks + +xz files are seekable on block boundaries only. Seeking is done by +seeking directly to the lower block boundary, then uncompressing data +until the precise byte is reached. As uncompressing xz data is slow, +B<to get best random access performance, you must prepare your xz +files with many small blocks.> + +Use the I<--block-size> option with a small-ish block size. For +example this is the same image compressed with a 16 MB block size: + + $ xz --best --block-size=16777216 winxp.img + ───────────────────── + $ xz --list winxp.img.xz + Strms Blocks Compressed Uncompressed Ratio Check Filename + 1 384 2,120.1 MiB 6,144.0 MiB 0.345 CRC64 winxp.img.xz + ↑↑↑ + blocks + +This file can be accessed randomly. At most 16 MB will have to be +uncompressed to seek to any byte. + +As you would expect, xz cannot compress as efficiently when using a +smaller block size. The space penalty in the above example is +S<E<lt> 1%> of the compressed file size. + +=head1 PARAMETERS + +=over 4 + +=item B<xz-max-block=>SIZE + +The maximum block size that the filter will read. The filter will +refuse to read xz files that contain any block larger than this size. + +See the discussion above about creating xz files with small block +sizes in order to reduce memory usage and increase performance. + +This parameter is optional. If not specified it defaults to 512M. + +=item B<xz-max-depth=>N + +Maximum number of blocks stored in the LRU block cache. + +This parameter is optional. If not specified it defaults to 8. + +The filter may allocate up to +S<maximum block size in file * maxdepth> +bytes of memory I<per connection>. + +=back + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-filter(3)>, +L<xz(1)>. + +=head1 AUTHORS + +Richard W.M. Jones + +=head1 COPYRIGHT + +Copyright (C) 2013-2018 Red Hat Inc. diff --git a/filters/xz/xz.c b/filters/xz/xz.c new file mode 100644 index 0000000..5610187 --- /dev/null +++ b/filters/xz/xz.c @@ -0,0 +1,230 @@ +/* nbdkit + * Copyright (C) 2013-2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdarg.h> +#include <stdint.h> +#include <inttypes.h> +#include <string.h> +#include <unistd.h> +#include <errno.h> + +#include <lzma.h> + +#include <nbdkit-filter.h> + +#include "xzfile.h" +#include "blkcache.h" + +static uint64_t maxblock = 512 * 1024 * 1024; +static size_t maxdepth = 8; + +static int +xz_config (nbdkit_next_config *next, void *nxdata, + const char *key, const char *value) +{ + if (strcmp (key, "xz-max-block") == 0) { + int64_t r = nbdkit_parse_size (value); + if (r == -1) + return -1; + maxblock = (uint64_t) r; + return 0; + } + else if (strcmp (key, "xz-max-depth") == 0) { + size_t r; + + if (sscanf (value, "%zu", &r) != 1) { + nbdkit_error ("could not parse 'xz-max-depth' parameter"); + return -1; + } + if (r == 0) { + nbdkit_error ("'xz-max-depth' parameter must be >= 1"); + return -1; + } + + maxdepth = r; + return 0; + } + else + return next (nxdata, key, value); +} + +#define xz_config_help \ + "xz-max-block=<SIZE> (optional) Maximum block size allowed (default: 512M)\n"\ + "xz-max-depth=<N> (optional) Maximum blocks in cache (default: 4)\n" + +/* The per-connection handle. */ +struct xz_handle { + xzfile *xz; + + /* Block cache. */ + blkcache *c; +}; + +/* Create the per-connection handle. */ +static void * +xz_open (nbdkit_next_open *next, void *nxdata, int readonly) +{ + struct xz_handle *h; + + /* Always pass readonly=1 to the underlying plugin. */ + if (next (nxdata, 1) == -1) + return NULL; + + h = malloc (sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + h->c = new_blkcache (maxdepth); + if (!h->c) { + free (h); + return NULL; + } + + /* Initialized in xz_prepare. */ + h->xz = NULL; + + return h; +} + +/* Free up the per-connection handle. */ +static void +xz_close (void *handle) +{ + struct xz_handle *h = handle; + blkcache_stats stats; + + if (h->c) { + blkcache_get_stats (h->c, &stats); + + nbdkit_debug ("cache: hits = %zu, misses = %zu", stats.hits, stats.misses); + } + + xzfile_close (h->xz); + free_blkcache (h->c); + free (h); +} + +static int +xz_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct xz_handle *h = handle; + + h->xz = xzfile_open (next_ops, nxdata); + if (!h->xz) + return -1; + + if (maxblock < xzfile_max_uncompressed_block_size (h->xz)) { + nbdkit_error ("xz file largest block is bigger than maxblock\n" + "Either recompress the xz file with smaller blocks (see nbdkit-xz-plugin(1))\n" + "or make maxblock parameter bigger.\n" + "maxblock = %" PRIu64 " (bytes)\n" + "largest block in xz file = %" PRIu64 " (bytes)", + maxblock, + xzfile_max_uncompressed_block_size (h->xz)); + return -1; + } + + return 0; +} + +/* Get the file size. */ +static int64_t +xz_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct xz_handle *h = handle; + + return xzfile_get_size (h->xz); +} + +/* Read data from the file. */ +static int +xz_pread (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + struct xz_handle *h = handle; + char *data; + uint64_t start, size; + uint32_t n; + + /* Find the block in the cache. */ + data = get_block (h->c, offset, &start, &size); + if (!data) { + /* Not in the cache. We need to read the block from the xz file. */ + data = xzfile_read_block (h->xz, next_ops, nxdata, flags, err, + offset, &start, &size); + if (data == NULL) + return -1; + put_block (h->c, start, size, data); + } + + /* It's possible if the blocks are really small or oddly aligned or + * if the requests are large that we need to read the following + * block to satisfy the request. + */ + n = count; + if (start + size - offset < n) + n = start + size - offset; + + memcpy (buf, &data[offset-start], n); + buf += n; + count -= n; + offset += n; + if (count > 0) + return xz_pread (next_ops, nxdata, h, buf, count, offset, flags, err); + + return 0; +} + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS + +static struct nbdkit_filter filter = { + .name = "xz", + .longname = "nbdkit XZ filter", + .version = PACKAGE_VERSION, + .config = xz_config, + .config_help = xz_config_help, + .open = xz_open, + .close = xz_close, + .prepare = xz_prepare, + .get_size = xz_get_size, + .pread = xz_pread, +}; + +NBDKIT_REGISTER_FILTER(filter) diff --git a/filters/xz/xzfile.c b/filters/xz/xzfile.c new file mode 100644 index 0000000..c5e1da1 --- /dev/null +++ b/filters/xz/xzfile.c @@ -0,0 +1,502 @@ +/* nbdkit + * Copyright (C) 2013 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* liblzma is a complex interface, so abstract it here. */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <stdint.h> +#include <inttypes.h> +#include <unistd.h> +#include <fcntl.h> +#include <sys/types.h> + +#include <nbdkit-filter.h> + +#include <lzma.h> + +#include "xzfile.h" + +#ifndef O_CLOEXEC +#define O_CLOEXEC 0 +#endif + +#define XZ_HEADER_MAGIC "\xfd" "7zXZ\0" +#define XZ_HEADER_MAGIC_LEN 6 +#define XZ_FOOTER_MAGIC "YZ" +#define XZ_FOOTER_MAGIC_LEN 2 + +struct xzfile { + lzma_index *idx; + size_t nr_streams; + size_t nr_blocks; + uint64_t max_uncompressed_block_size; +}; + +static bool check_header_magic (struct nbdkit_next_ops *next_ops, void *nxdata); +static lzma_index *parse_indexes (struct nbdkit_next_ops *next_ops, void *nxdata, size_t *); +static int iter_indexes (lzma_index *idx, size_t *, uint64_t *); + +xzfile * +xzfile_open (struct nbdkit_next_ops *next_ops, void *nxdata) +{ + xzfile *xz; + uint64_t size; + + xz = malloc (sizeof *xz); + if (xz == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + /* Check file magic. */ + if (!check_header_magic (next_ops, nxdata)) { + nbdkit_error ("xz: not an xz file"); + goto err1; + } + + /* Read and parse the indexes. */ + xz->idx = parse_indexes (next_ops, nxdata, &xz->nr_streams); + if (xz->idx == NULL) + goto err1; + + /* Iterate over indexes to find the number of and largest block. */ + if (iter_indexes (xz->idx, + &xz->nr_blocks, &xz->max_uncompressed_block_size) == -1) + goto err1; + + size = lzma_index_uncompressed_size (xz->idx); + nbdkit_debug ("xz: size %" PRIu64 " bytes (%.1fM)", + size, size / 1024.0 / 1024.0); + nbdkit_debug ("xz: %zu streams, %zu blocks", xz->nr_streams, xz->nr_blocks); + nbdkit_debug ("xz: maximum uncompressed block size %" PRIu64 " bytes (%.1fM)", + xz->max_uncompressed_block_size, + xz->max_uncompressed_block_size / 1024.0 / 1024.0); + + return xz; + + err1: + free (xz); + return NULL; +} + +static bool +check_header_magic (struct nbdkit_next_ops *next_ops, void *nxdata) +{ + char buf[XZ_HEADER_MAGIC_LEN]; + int err; + + if (next_ops->pread (nxdata, buf, XZ_HEADER_MAGIC_LEN, 0, 0, &err) == -1) { + nbdkit_error ("xz: could not read header magic: error %d", err); + return false; + } + if (memcmp (buf, XZ_HEADER_MAGIC, XZ_HEADER_MAGIC_LEN) != 0) + return false; + return true; +} + +/* For explanation of this function, see src/xz/list.c:parse_indexes + * in the xz sources. + */ +static lzma_index * +parse_indexes (struct nbdkit_next_ops *next_ops, void *nxdata, + size_t *nr_streams) +{ + lzma_ret r; + int64_t size, pos, index_size, offs; + int err; + uint8_t footer[LZMA_STREAM_HEADER_SIZE]; + uint8_t header[LZMA_STREAM_HEADER_SIZE]; + lzma_stream_flags footer_flags; + lzma_stream_flags header_flags; + lzma_stream strm = LZMA_STREAM_INIT; + lzma_index *combined_index = NULL; + lzma_index *this_index = NULL; + lzma_vli stream_padding = 0; + + *nr_streams = 0; + + /* Check file size is a multiple of 4 bytes. */ + pos = size = next_ops->get_size (nxdata); + if (pos == -1) { + nbdkit_error ("xz: get_size: %m"); + goto err; + } + if ((pos & 3) != 0) { + nbdkit_error ("xz: not an xz file: size is not a multiple of 4 bytes"); + goto err; + } + + /* Jump backwards through the file identifying each stream. */ + while (pos > 0) { + nbdkit_debug ("looping through streams: pos = %" PRIi64, pos); + + if (pos < LZMA_STREAM_HEADER_SIZE) { + nbdkit_error ("xz: corrupted file at %" PRIi64, pos); + goto err; + } + + if (next_ops->pread (nxdata, footer, LZMA_STREAM_HEADER_SIZE, + pos - LZMA_STREAM_HEADER_SIZE, + 0, &err) == -1) { + nbdkit_error ("xz: read stream footer: error %d", err); + goto err; + } + /* Skip stream padding. */ + if (footer[8] == 0 && footer[9] == 0 && + footer[10] == 0 && footer[11] == 0) { + stream_padding += 4; + pos -= 4; + continue; + } + + pos -= LZMA_STREAM_HEADER_SIZE; + (*nr_streams)++; + + nbdkit_debug ("decode stream footer at pos = %" PRIi64, pos); + + /* Does the stream footer look reasonable? */ + r = lzma_stream_footer_decode (&footer_flags, footer); + if (r != LZMA_OK) { + nbdkit_error ("xz: invalid stream footer (error %d)", r); + goto err; + } + nbdkit_debug ("backward_size = %" PRIu64, + (uint64_t) footer_flags.backward_size); + index_size = footer_flags.backward_size; + if (pos < index_size + LZMA_STREAM_HEADER_SIZE) { + nbdkit_error ("xz: invalid stream footer"); + goto err; + } + + pos -= index_size; + nbdkit_debug ("decode index at pos = %" PRIi64, pos); + + /* Decode the index. */ + r = lzma_index_decoder (&strm, &this_index, UINT64_MAX); + if (r != LZMA_OK) { + nbdkit_error ("xz: invalid stream index (error %d)", r); + goto err; + } + + offs = pos; + do { + uint8_t buf[BUFSIZ]; + + strm.avail_in = index_size; + if (strm.avail_in > BUFSIZ) + strm.avail_in = BUFSIZ; + if (pos + strm.avail_in > size) + strm.avail_in = size - pos; + + if (next_ops->pread (nxdata, buf, strm.avail_in, offs, 0, &err) == -1) { + nbdkit_error ("xz: read index: error %d", err); + goto err; + } + offs += strm.avail_in; + index_size -= strm.avail_in; + + strm.next_in = buf; + r = lzma_code (&strm, LZMA_RUN); + } while (r == LZMA_OK); + + if (r != LZMA_STREAM_END) { + nbdkit_error ("xz: could not parse index (error %d)", r); + goto err; + } + + pos -= lzma_index_total_size (this_index) + LZMA_STREAM_HEADER_SIZE; + + nbdkit_debug ("decode stream header at pos = %" PRIi64, pos); + + /* Read and decode the stream header. */ + if (next_ops->pread (nxdata, header, LZMA_STREAM_HEADER_SIZE, pos, 0, + &err) == -1) { + nbdkit_error ("xz: read stream header: error %d", err); + goto err; + } + + r = lzma_stream_header_decode (&header_flags, header); + if (r != LZMA_OK) { + nbdkit_error ("xz: invalid stream header (error %d)", r); + goto err; + } + + /* Header and footer of the stream should be equal. */ + r = lzma_stream_flags_compare (&header_flags, &footer_flags); + if (r != LZMA_OK) { + nbdkit_error ("xz: header and footer of stream are not equal (error %d)", + r); + goto err; + } + + /* Store the decoded stream flags in this_index. */ + r = lzma_index_stream_flags (this_index, &footer_flags); + if (r != LZMA_OK) { + nbdkit_error ("xz: cannot read stream_flags from index (error %d)", r); + goto err; + } + + /* Store the amount of stream padding so far. Needed to calculate + * compressed offsets correctly in multi-stream files. + */ + r = lzma_index_stream_padding (this_index, stream_padding); + if (r != LZMA_OK) { + nbdkit_error ("xz: cannot set stream_padding in index (error %d)", r); + goto err; + } + + if (combined_index != NULL) { + r = lzma_index_cat (this_index, combined_index, NULL); + if (r != LZMA_OK) { + nbdkit_error ("xz: cannot combine indexes"); + goto err; + } + } + + combined_index = this_index; + this_index = NULL; + } + + lzma_end (&strm); + + return combined_index; + + err: + lzma_end (&strm); + lzma_index_end (this_index, NULL); + lzma_index_end (combined_index, NULL); + return NULL; +} + +/* Iterate over the indexes to find the number of blocks and + * the largest block. + */ +static int +iter_indexes (lzma_index *idx, + size_t *nr_blocks, uint64_t *max_uncompressed_block_size) +{ + lzma_index_iter iter; + + *nr_blocks = 0; + *max_uncompressed_block_size = 0; + + lzma_index_iter_init (&iter, idx); + while (!lzma_index_iter_next (&iter, LZMA_INDEX_ITER_NONEMPTY_BLOCK)) { + if (iter.block.uncompressed_size > *max_uncompressed_block_size) + *max_uncompressed_block_size = iter.block.uncompressed_size; + (*nr_blocks)++; + } + + return 0; +} + +void +xzfile_close (xzfile *xz) +{ + if (xz) { + lzma_index_end (xz->idx, NULL); + free (xz); + } +} + +uint64_t +xzfile_max_uncompressed_block_size (xzfile *xz) +{ + return xz->max_uncompressed_block_size; +} + +uint64_t +xzfile_get_size (xzfile *xz) +{ + return lzma_index_uncompressed_size (xz->idx); +} + +char * +xzfile_read_block (xzfile *xz, + struct nbdkit_next_ops *next_ops, + void *nxdata, uint32_t flags, int *err, + uint64_t offset, + uint64_t *start_rtn, uint64_t *size_rtn) +{ + int64_t offs, size; + lzma_index_iter iter; + uint8_t header[LZMA_BLOCK_HEADER_SIZE_MAX]; + lzma_block block; + lzma_filter filters[LZMA_FILTERS_MAX + 1]; + lzma_ret r; + lzma_stream strm = LZMA_STREAM_INIT; + const size_t bufsize = 1024 * 1024; + unsigned char *buf = NULL; + char *data = NULL; + size_t i; + + /* Read the total size of the underlying disk, so we don't + * read over the end. + */ + size = next_ops->get_size (nxdata); + if (size == -1) { + nbdkit_error ("xz: get_size: %m"); + return NULL; + } + + /* Locate the block containing the uncompressed offset. */ + lzma_index_iter_init (&iter, xz->idx); + if (lzma_index_iter_locate (&iter, offset)) { + nbdkit_error ("cannot find offset %" PRIu64 " in the xz file", offset); + return NULL; + } + + *start_rtn = iter.block.uncompressed_file_offset; + *size_rtn = iter.block.uncompressed_size; + + nbdkit_debug ("seek: block number %d at file offset %" PRIu64, + (int) iter.block.number_in_file, + (uint64_t) iter.block.compressed_file_offset); + + /* Read the block header. Start by reading a single byte which + * tell us how big the block header is. + */ + offs = iter.block.compressed_file_offset; + if (next_ops->pread (nxdata, header, 1, offs, 0, err) == -1) { + nbdkit_error ("xz: read: could not read block header byte: error %d", *err); + return NULL; + } + offs++; + + if (header[0] == '\0') { + nbdkit_error ("xz: read: unexpected invalid block in file, header[0] = 0"); + return NULL; + } + + block.version = 0; + block.check = iter.stream.flags->check; + block.filters = filters; + block.header_size = lzma_block_header_size_decode (header[0]); + + /* Now read and decode the block header. */ + if (next_ops->pread (nxdata, &header[1], block.header_size-1, offs, + 0, err) == -1) { + nbdkit_error ("xz: read: could not read block of compressed data: " + "error %d", *err); + return NULL; + } + offs += block.header_size - 1; + + r = lzma_block_header_decode (&block, NULL, header); + if (r != LZMA_OK) { + nbdkit_error ("invalid block header (error %d)", r); + return NULL; + } + + /* What this actually does is it checks that the block header + * matches the index. + */ + r = lzma_block_compressed_size (&block, iter.block.unpadded_size); + if (r != LZMA_OK) { + nbdkit_error ("cannot calculate compressed size (error %d)", r); + goto err1; + } + + /* Read the block data. */ + r = lzma_block_decoder (&strm, &block); + if (r != LZMA_OK) { + nbdkit_error ("invalid block (error %d)", r); + goto err1; + } + + data = malloc (*size_rtn); + if (data == NULL) { + nbdkit_error ("malloc (%" PRIu64 " bytes): %m\n" + "NOTE: If this error occurs, you need to recompress your xz files with a smaller block size. Use: 'xz --block-size=16777216 ...'.", + *size_rtn); + goto err2; + } + + buf = malloc (bufsize); + if (buf == NULL) { + nbdkit_error ("malloc: %m"); + goto err2; + } + + strm.next_in = NULL; + strm.avail_in = 0; + strm.next_out = (uint8_t *) data; + strm.avail_out = block.uncompressed_size; + do { + if (strm.avail_in == 0) { + strm.avail_in = bufsize; + if (offs + strm.avail_in > size) + strm.avail_in = size - offs; + if (strm.avail_in > 0) { + strm.next_in = buf; + if (next_ops->pread (nxdata, buf, strm.avail_in, offs, 0, err) == -1) { + nbdkit_error ("xz: read: error %d", *err); + goto err2; + } + offs += strm.avail_in; + } + } + + r = lzma_code (&strm, LZMA_RUN); + } while (r == LZMA_OK); + + if (r != LZMA_OK && r != LZMA_STREAM_END) { + nbdkit_error ("could not parse block data (error %d)", r); + goto err2; + } + + lzma_end (&strm); + + for (i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) + free (filters[i].options); + + free (buf); + + return data; + + err2: + lzma_end (&strm); + err1: + for (i = 0; filters[i].id != LZMA_VLI_UNKNOWN; ++i) + free (filters[i].options); + + free (data); + free (buf); + + return NULL; +} diff --git a/filters/xz/xzfile.h b/filters/xz/xzfile.h new file mode 100644 index 0000000..9c9ddbc --- /dev/null +++ b/filters/xz/xzfile.h @@ -0,0 +1,71 @@ +/* nbdkit + * Copyright (C) 2013-2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* liblzma is a complex interface, so abstract it here. */ + +#ifndef NBDKIT_XZFILE_H +#define NBDKIT_XZFILE_H + +#include <nbdkit-filter.h> + +typedef struct xzfile xzfile; + +/* Open (and verify) the named xz file. */ +extern xzfile *xzfile_open (struct nbdkit_next_ops *next_ops, void *nxdata); + +/* Close the file and free up all resources. */ +extern void xzfile_close (xzfile *); + +/* Get (uncompressed) size of the largest block in the file. */ +extern uint64_t xzfile_max_uncompressed_block_size (xzfile *); + +/* Get the total uncompressed size of the file. */ +extern uint64_t xzfile_get_size (xzfile *); + +/* Read the xz file block that contains the byte at 'offset' in the + * uncompressed file. + * + * The uncompressed block of data, which probably begins before the + * requested byte and ends after it, is returned. The caller must + * free it. NULL is returned if there was an error. + * + * The start offset & size of the block relative to the uncompressed + * file are returned in *start and *size. + */ +extern char *xzfile_read_block (xzfile *xz, + struct nbdkit_next_ops *next_ops, + void *nxdata, uint32_t flags, int *err, + uint64_t offset, + uint64_t *start, uint64_t *size); + +#endif /* NBDKIT_XZFILE_H */ diff --git a/tests/Makefile.am b/tests/Makefile.am index 959d6e6..0cd2366 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -472,25 +472,6 @@ libvixDiskLib_la_CXXFLAGS = \ libvixDiskLib_la_LDFLAGS = \ -shared -version-number 6:0:0 -rpath /nowhere -# xz plugin test. -if HAVE_LIBLZMA -if HAVE_GUESTFISH - -LIBGUESTFS_TESTS += test-xz -check_DATA += disk.xz -MAINTAINERCLEANFILES += disk.xz - -test_xz_SOURCES = test-xz.c test.h -test_xz_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) -test_xz_LDADD = libtest.la $(LIBGUESTFS_LIBS) - -disk.xz: disk - rm -f $@ - xz --best --block-size=16777216 -c disk > $@ - -endif HAVE_GUESTFISH -endif HAVE_LIBLZMA - # zero plugin test. TESTS += test-zero.sh @@ -746,4 +727,23 @@ TESTS += \ test-truncate2.sh \ test-truncate3.sh +# xz filter test. +if HAVE_LIBLZMA +if HAVE_GUESTFISH + +LIBGUESTFS_TESTS += test-xz +check_DATA += disk.xz +MAINTAINERCLEANFILES += disk.xz + +test_xz_SOURCES = test-xz.c test.h +test_xz_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) +test_xz_LDADD = libtest.la $(LIBGUESTFS_LIBS) + +disk.xz: disk + rm -f $@ + xz --best --block-size=16777216 -c disk > $@ + +endif HAVE_GUESTFISH +endif HAVE_LIBLZMA + endif HAVE_PLUGINS diff --git a/tests/test-xz.c b/tests/test-xz.c index 46b1d04..7248a96 100644 --- a/tests/test-xz.c +++ b/tests/test-xz.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013 Red Hat Inc. + * Copyright (C) 2013-2018 Red Hat Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -51,7 +51,7 @@ main (int argc, char *argv[]) int r; char *data; - if (test_start_nbdkit ("xz", "-r", "disk.xz", NULL) == -1) + if (test_start_nbdkit ("--filter=xz", "file", "disk.xz", NULL) == -1) exit (EXIT_FAILURE); g = guestfs_create (); -- 2.19.0.rc0
Richard W.M. Jones
2018-Nov-21 19:00 UTC
[Libguestfs] [PATCH nbdkit v2 2/3] xz: Deprecate xz plugin.
Plan to remove it in the next-by-one stable release. --- plugins/xz/nbdkit-xz-plugin.pod | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/plugins/xz/nbdkit-xz-plugin.pod b/plugins/xz/nbdkit-xz-plugin.pod index 2bd0a1c..acd7817 100644 --- a/plugins/xz/nbdkit-xz-plugin.pod +++ b/plugins/xz/nbdkit-xz-plugin.pod @@ -6,6 +6,15 @@ nbdkit-xz-plugin - nbdkit xz plugin nbdkit xz [file=]FILENAME.xz +=head1 DEPRECATED + +B<The xz plugin is deprecated in S<nbdkit E<ge> 1.9.2> and will be +removed in S<nbdkit 1.12>>. It has been replaced with a filter with +the same functionality, see L<nbdkit-xz-filter(1)>. You can use the +filter like this: + + nbdkit --filter=xz file file.xz + =head1 DESCRIPTION C<nbdkit-xz-plugin> is a file serving plugin for L<nbdkit(1)>. -- 2.19.0.rc0
Richard W.M. Jones
2018-Nov-21 19:00 UTC
[Libguestfs] [PATCH nbdkit v2 3/3] xz: tests: Use a very small block size in test disk.
A better test of edge cases which occur between blocks. In fact this change found two bugs while developing the xz filter (not present in the plugin). --- tests/Makefile.am | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/Makefile.am b/tests/Makefile.am index 0cd2366..e9b01f2 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -741,7 +741,9 @@ test_xz_LDADD = libtest.la $(LIBGUESTFS_LIBS) disk.xz: disk rm -f $@ - xz --best --block-size=16777216 -c disk > $@ +# We choose a very small block size here only for testing. Normally +# you should choose a 16M block size. + xz --best --block-size=32768 -c disk > $@ endif HAVE_GUESTFISH endif HAVE_LIBLZMA -- 2.19.0.rc0
Apparently Analagous Threads
- [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- [PATCH supermin 0/2] Allow an alternate libc to be used for init.
- [PATCH v2 0/3 supermin] URPMI & xz support.
- [PATCH supermin 0/2] Allow an alternate libc to be used for init.
- [nbdkit PATCH 0/2] ext2 export list tweaks