Richard W.M. Jones
2019-Jan-04 09:48 UTC
[Libguestfs] [PATCH nbdkit v5 3/3] cache: Implement cache-max-size and cache space reclaim.
v4: https://www.redhat.com/archives/libguestfs/2019-January/msg00032.html v5: - Now we set the block size at run time. I'd like to say that I was able to test this change, but unfortunately I couldn't find any easy way to create a filesystem on x86-64 with a block size > 4K. Ext4 doesn't support it at all, and XFS doesn't support block size > page size (and I recently gave away my aarch64 machine that ran RHEL, so ..) - Addressed a few other minor issues that Eric found in v4. - Retested on Linux, FreeBSD and OpenBSD. I'm including the minmax patch which I sent before. That's because it isn't independent as I thought it was - it depends on the LRU patch. Rich.
Richard W.M. Jones
2019-Jan-04 09:48 UTC
[Libguestfs] [PATCH nbdkit v5 1/3] cache: Implement LRU structure.
Implement a simple LRU structure for the cache filter using bitmaps. --- filters/cache/lru.h | 54 ++++++++++++++ filters/cache/blk.c | 12 +++ filters/cache/lru.c | 151 ++++++++++++++++++++++++++++++++++++++ filters/cache/Makefile.am | 2 + 4 files changed, 219 insertions(+) diff --git a/filters/cache/lru.h b/filters/cache/lru.h new file mode 100644 index 0000000..4faefcd --- /dev/null +++ b/filters/cache/lru.h @@ -0,0 +1,54 @@ +/* nbdkit + * Copyright (C) 2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef NBDKIT_LRU_H +#define NBDKIT_LRU_H + +#include <stdbool.h> + +/* Initialize LRU. */ +extern void lru_init (void); + +/* Free the LRU. */ +extern void lru_free (void); + +/* Notify LRU that the virtual size has changed. */ +extern int lru_set_size (uint64_t new_size); + +/* Mark a block as recently accessed in the LRU structure. */ +extern void lru_set_recently_accessed (uint64_t blknum); + +/* Check if a block has been recently accessed. */ +extern bool lru_has_been_recently_accessed (uint64_t blknum); + +#endif /* NBDKIT_LRU_H */ diff --git a/filters/cache/blk.c b/filters/cache/blk.c index 4000276..b256446 100644 --- a/filters/cache/blk.c +++ b/filters/cache/blk.c @@ -53,6 +53,7 @@ #include "cache.h" #include "blk.h" +#include "lru.h" /* The cache. */ static int fd = -1; @@ -80,6 +81,8 @@ blk_init (void) size_t len; char *template; + lru_init (); + bitmap_init (&bm, BLKSIZE, 2 /* bits per block */); tmpdir = getenv ("TMPDIR"); @@ -115,6 +118,8 @@ blk_free (void) close (fd); bitmap_free (&bm); + + lru_free (); } int @@ -128,6 +133,9 @@ blk_set_size (uint64_t new_size) return -1; } + if (lru_set_size (new_size) == -1) + return -1; + return 0; } @@ -163,6 +171,7 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, return -1; } bitmap_set_blk (&bm, blknum, BLOCK_CLEAN); + lru_set_recently_accessed (blknum); } return 0; } @@ -172,6 +181,7 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, nbdkit_error ("pread: %m"); return -1; } + lru_set_recently_accessed (blknum); return 0; } } @@ -196,6 +206,7 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata, return -1; bitmap_set_blk (&bm, blknum, BLOCK_CLEAN); + lru_set_recently_accessed (blknum); return 0; } @@ -222,6 +233,7 @@ blk_write (struct nbdkit_next_ops *next_ops, void *nxdata, return -1; } bitmap_set_blk (&bm, blknum, BLOCK_DIRTY); + lru_set_recently_accessed (blknum); return 0; } diff --git a/filters/cache/lru.c b/filters/cache/lru.c new file mode 100644 index 0000000..c828968 --- /dev/null +++ b/filters/cache/lru.c @@ -0,0 +1,151 @@ +/* nbdkit + * Copyright (C) 2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* These are the block operations. They always read or write a single + * whole block of size ‘blksize’. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdbool.h> +#include <inttypes.h> + +#include <nbdkit-filter.h> + +#include "bitmap.h" + +#include "cache.h" +#include "blk.h" +#include "lru.h" + +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +/* LRU bitmaps. These bitmaps implement a simple, fast LRU structure. + * + * bm[0] bm[1] blocks not in either bitmap + * ┌─────────┬──────────────────┬─────────────────────────────┐ + * │ │ │ │ + * └─────────┴──────────────────┴─────────────────────────────┘ + * ↑ c1 bits set + * c0 bits set + * + * The LRU structure keeps track of the [approx] last N distinct + * blocks which have been most recently accessed. It can answer in + * O(1) time the question: “Is a particular block in or not in the N + * distinct blocks most recently accessed?” + * + * To do this we keep two bitmaps. + * + * When a new block is accessed, we set the corresponding bit in bm[0] + * and increment c0 (c0 counts the number of bits set in bm[0]). If + * c0 == N/2 then we swap the two bitmaps, clear bm[0], and reset c0 + * to 0. + * + * To check if a block has been accessed within the previous N + * distinct accesses, we simply have to check both bitmaps. If it is + * not in either bitmap, then it's old and a candidate to be + * reclaimed. + * + * You'll note that in fact we only keep track of between N/2 and N + * recently accessed blocks. We could make the estimate more accurate + * by having more bitmaps, but as this is only a heuristic we choose + * to keep the implementation simple and memory usage low instead. + */ +static struct bitmap bm[2]; +static unsigned c0 = 0, c1 = 0; +static unsigned N = 100; + +void +lru_init (void) +{ + bitmap_init (&bm[0], BLKSIZE, 1 /* bits per block */); + bitmap_init (&bm[1], BLKSIZE, 1 /* bits per block */); +} + +void +lru_free (void) +{ + bitmap_free (&bm[0]); + bitmap_free (&bm[1]); +} + +int +lru_set_size (uint64_t new_size) +{ + if (bitmap_resize (&bm[0], new_size) == -1) + return -1; + if (bitmap_resize (&bm[1], new_size) == -1) + return -1; + + /* XXX Choose this better. */ + N = MAX (new_size / BLKSIZE / 4, 100); + + return 0; +} + +void +lru_set_recently_accessed (uint64_t blknum) +{ + /* If the block is already set in the first bitmap, don't need to do + * anything. + */ + if (bitmap_get_blk (&bm[0], blknum, false)) + return; + + bitmap_set_blk (&bm[0], blknum, true); + c0++; + + /* If we've reached N/2 then we need to swap over the bitmaps. */ + if (c0 >= N/2) { + struct bitmap tmp; + + tmp = bm[0]; + bm[0] = bm[1]; + bm[1] = tmp; + c1 = c0; + + bitmap_clear (&bm[0]); + c0 = 0; + } +} + +bool +lru_has_been_recently_accessed (uint64_t blknum) +{ + return + bitmap_get_blk (&bm[0], blknum, false) || + bitmap_get_blk (&bm[1], blknum, false); +} diff --git a/filters/cache/Makefile.am b/filters/cache/Makefile.am index 3a542af..1620fa1 100644 --- a/filters/cache/Makefile.am +++ b/filters/cache/Makefile.am @@ -41,6 +41,8 @@ nbdkit_cache_filter_la_SOURCES = \ blk.h \ cache.c \ cache.h \ + lru.c \ + lru.h \ $(top_srcdir)/include/nbdkit-filter.h nbdkit_cache_filter_la_CPPFLAGS = \ -- 2.19.2
Richard W.M. Jones
2019-Jan-04 09:48 UTC
[Libguestfs] [PATCH nbdkit v5 2/3] common/include: Add generic MIN and MAX macros.
The preferred implementation uses __auto_type, a GCC extension also now supported by Clang. Unfortunately OpenBSD ships with GCC 4.2.1 (from 2007!) which predates this extension by quite a few years, so we have to be able to fall back to a plain macro. --- configure.ac | 20 ++++++++++- common/include/minmax.h | 63 +++++++++++++++++++++++++++++++++++ filters/blocksize/blocksize.c | 3 +- filters/cache/lru.c | 3 +- filters/nozero/nozero.c | 3 +- plugins/pattern/pattern.c | 3 +- common/include/Makefile.am | 1 + filters/blocksize/Makefile.am | 3 +- filters/nozero/Makefile.am | 3 +- 9 files changed, 93 insertions(+), 9 deletions(-) diff --git a/configure.ac b/configure.ac index 606e632..ee1ff0d 100644 --- a/configure.ac +++ b/configure.ac @@ -156,9 +156,27 @@ you may be using a C compiler which does not support this attribute, or the configure test may be wrong. This code requires the attribute to work for proper locking between threads.])]) -dnl restore CFLAGS CFLAGS="${acx_nbdkit_save_CFLAGS}" +dnl Check for __auto_type (GCC extension). +AC_MSG_CHECKING([if __auto_type is available in this compiler]) +AC_COMPILE_IFELSE([ +AC_LANG_SOURCE([[ +static int +test (int a) +{ + __auto_type at = a; + return at; +} +]]) + ],[ + AC_MSG_RESULT([yes]) + AC_DEFINE([HAVE_AUTO_TYPE],[1],[__auto_type is available]) + ],[ + AC_MSG_RESULT([no]) + ] +) + dnl Check for other headers, all optional. AC_CHECK_HEADERS([\ alloca.h \ diff --git a/common/include/minmax.h b/common/include/minmax.h new file mode 100644 index 0000000..dc01679 --- /dev/null +++ b/common/include/minmax.h @@ -0,0 +1,63 @@ +/* nbdkit + * Copyright (C) 2019 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef NBDKIT_MINMAX_H +#define NBDKIT_MINMAX_H + +#include <config.h> + +#ifdef HAVE_AUTO_TYPE + +/* __auto_type is a GCC extension, added in GCC 4.9 in 2014, and to + * Clang in 2015. + */ + +#define MIN(x, y) ({ \ + __auto_type _x = (x); \ + __auto_type _y = (y); \ + _x < _y ? _x : _y; \ + }) +#define MAX(x, y) ({ \ + __auto_type _x = (x); \ + __auto_type _y = (y); \ + _x > _y ? _x : _y; \ + }) + +#else + +#define MIN(x, y) ((x) < (y) ? (x) : (y)) +#define MAX(x, y) ((x) > (y) ? (x) : (y)) + +#endif + +#endif /* NBDKIT_MINMAX_H */ diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c index 6acd9b6..34f58c9 100644 --- a/filters/blocksize/blocksize.c +++ b/filters/blocksize/blocksize.c @@ -44,11 +44,12 @@ #include <nbdkit-filter.h> +#include "minmax.h" + /* XXX See design comment in filters/cow/cow.c. */ #define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS #define BLOCKSIZE_MIN_LIMIT (64U * 1024) -#define MIN(a, b) ((a) < (b) ? (a) : (b)) /* As long as we don't have parallel requests, we can reuse a common * buffer for alignment purposes; size it to the maximum we allow for diff --git a/filters/cache/lru.c b/filters/cache/lru.c index c828968..9e20408 100644 --- a/filters/cache/lru.c +++ b/filters/cache/lru.c @@ -46,13 +46,12 @@ #include <nbdkit-filter.h> #include "bitmap.h" +#include "minmax.h" #include "cache.h" #include "blk.h" #include "lru.h" -#define MAX(a, b) ((a) > (b) ? (a) : (b)) - /* LRU bitmaps. These bitmaps implement a simple, fast LRU structure. * * bm[0] bm[1] blocks not in either bitmap diff --git a/filters/nozero/nozero.c b/filters/nozero/nozero.c index dac47ad..6a22f9b 100644 --- a/filters/nozero/nozero.c +++ b/filters/nozero/nozero.c @@ -42,9 +42,10 @@ #include <nbdkit-filter.h> +#include "minmax.h" + #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL -#define MIN(a, b) ((a) < (b) ? (a) : (b)) #define MAX_WRITE (64 * 1024 * 1024) static char buffer[MAX_WRITE]; diff --git a/plugins/pattern/pattern.c b/plugins/pattern/pattern.c index 1d1b234..32db381 100644 --- a/plugins/pattern/pattern.c +++ b/plugins/pattern/pattern.c @@ -45,6 +45,7 @@ #include <nbdkit-plugin.h> #include "byte-swapping.h" +#include "minmax.h" /* The size of disk in bytes (initialized by size=<SIZE> parameter). */ static int64_t size = 0; @@ -87,8 +88,6 @@ pattern_get_size (void *handle) return size; } -#define MIN(a, b) ((a) < (b) ? (a) : (b)) - /* Read data. */ static int pattern_pread (void *handle, void *buf, uint32_t count, uint64_t offset, diff --git a/common/include/Makefile.am b/common/include/Makefile.am index 3e7f056..7df4855 100644 --- a/common/include/Makefile.am +++ b/common/include/Makefile.am @@ -41,6 +41,7 @@ EXTRA_DIST = \ isaligned.h \ ispowerof2.h \ iszero.h \ + minmax.h \ nextnonzero.h \ random.h \ rounding.h diff --git a/filters/blocksize/Makefile.am b/filters/blocksize/Makefile.am index b49fc3f..231c9b5 100644 --- a/filters/blocksize/Makefile.am +++ b/filters/blocksize/Makefile.am @@ -41,7 +41,8 @@ nbdkit_blocksize_filter_la_SOURCES = \ $(top_srcdir)/include/nbdkit-filter.h nbdkit_blocksize_filter_la_CPPFLAGS = \ - -I$(top_srcdir)/include + -I$(top_srcdir)/include \ + -I$(top_srcdir)/common/include nbdkit_blocksize_filter_la_CFLAGS = \ $(WARNINGS_CFLAGS) nbdkit_blocksize_filter_la_LDFLAGS = \ diff --git a/filters/nozero/Makefile.am b/filters/nozero/Makefile.am index b1fd4a8..fe88233 100644 --- a/filters/nozero/Makefile.am +++ b/filters/nozero/Makefile.am @@ -41,7 +41,8 @@ nbdkit_nozero_filter_la_SOURCES = \ $(top_srcdir)/include/nbdkit-filter.h nbdkit_nozero_filter_la_CPPFLAGS = \ - -I$(top_srcdir)/include + -I$(top_srcdir)/include \ + -I$(top_srcdir)/common/include nbdkit_nozero_filter_la_CFLAGS = \ $(WARNINGS_CFLAGS) nbdkit_nozero_filter_la_LDFLAGS = \ -- 2.19.2
Richard W.M. Jones
2019-Jan-04 09:48 UTC
[Libguestfs] [PATCH nbdkit v5 3/3] cache: Implement cache-max-size and cache space reclaim.
The original plan was to have a background thread doing the reclaim. However that cannot work given the design of filters, because a background thread cannot access the next_ops struct which is only available during requests. Therefore we spread the work over the request threads. Each blk_* function checks whether there is work to do, and if there is will reclaim up to two blocks from the cache (thus ensuring that we always make forward progress, since each blk_* function can only add at most one block to the cache). Another large change is that the cache block size can no longer be fixed. We must use a block size which is at least as large as the filesystem block size so that FALLOC_FL_PUNCH_HOLE works. To do this, test the filesystem block size and set blksize dynamically to MAX (4096, filesystem block size). This also adds a test --- filters/cache/nbdkit-cache-filter.pod | 50 +++++++ filters/cache/cache.h | 14 +- filters/cache/reclaim.h | 53 +++++++ filters/cache/blk.c | 54 +++++-- filters/cache/cache.c | 95 ++++++++++-- filters/cache/lru.c | 11 +- filters/cache/reclaim.c | 207 ++++++++++++++++++++++++++ TODO | 11 +- filters/cache/Makefile.am | 2 + tests/Makefile.am | 2 + tests/test-cache-max-size.sh | 96 ++++++++++++ 11 files changed, 547 insertions(+), 48 deletions(-) diff --git a/filters/cache/nbdkit-cache-filter.pod b/filters/cache/nbdkit-cache-filter.pod index 09dc39f..ae2fb42 100644 --- a/filters/cache/nbdkit-cache-filter.pod +++ b/filters/cache/nbdkit-cache-filter.pod @@ -5,6 +5,9 @@ nbdkit-cache-filter - nbdkit caching filter =head1 SYNOPSIS nbdkit --filter=cache plugin [cache=writeback|writethrough|unsafe] + [cache-max-size=SIZE] + [cache-high-threshold=N] + [cache-low-threshold=N] [cache-on-read=true|false] [plugin-args...] @@ -51,6 +54,14 @@ This is dangerous and can cause data loss, but this may be acceptable if you only use it for testing or with data that you don't care about or can cheaply reconstruct. +=item B<cache-max-size=SIZE> + +=item B<cache-high-threshold=N> + +=item B<cache-low-threshold=N> + +Limit the size of the cache to C<SIZE>. See L</CACHE MAXIMUM SIZE> below. + =item B<cache-on-read=true> Cache read requests as well as write requests. Any time a block is @@ -69,6 +80,45 @@ Do not cache read requests (this is the default). =back +=head1 CACHE MAXIMUM SIZE + +By default the cache can grow to any size (although not larger than +the virtual size of the underlying plugin) and you have to ensure +there is sufficient space in C<$TMPDIR> for it. + +Using the parameters C<cache-max-size>, C<cache-high-threshold> and +C<cache-low-threshold> you can limit the maximum size of the cache. + +This requires kernel and filesystem support (for L<fallocate(2)> +C<FALLOC_FL_PUNCH_HOLE>), so it may not work on all platforms. + +Some examples: + +=over 4 + +=item C<cache-max-size=1G> + +The cache is limited to around 1 gigabyte. + +=item C<cache-max-size=1G cache-high-threshold=95 cache-low-threshold=80> + +Once the cache size reaches 972M (95% of 1G), blocks are reclaimed +from the cache until the size is reduced to 819M (80% of 1G). + +=back + +The way this works is once the size of the cache exceeds +S<C<SIZE> ✕ the high threshold>, the filter works to reduce the size +of the cache until it is less than S<C<SIZE> ✕ the low threshold>. +Once the size is below the low threshold, no more reclaim work is done +until the size exceeds the high threshold again. + +The default thresholds are high 95% and low 80%. You must set +S<0 E<lt> low E<lt> high>. The thresholds are expressed as integer +percentages of C<cache-max-size>. + +Least recently used blocks are discarded first. + =head1 ENVIRONMENT VARIABLES =over 4 diff --git a/filters/cache/cache.h b/filters/cache/cache.h index 50416b1..cafd421 100644 --- a/filters/cache/cache.h +++ b/filters/cache/cache.h @@ -34,12 +34,7 @@ #ifndef NBDKIT_CACHE_H #define NBDKIT_CACHE_H -/* Size of a block in the cache. A 4K block size means that we need - * 64 MB of memory to store the bitmaps for a 1 TB underlying image. - * It is also smaller than the usual hole size for sparse files, which - * means we have no reason to call next_ops->zero. - */ -#define BLKSIZE 4096 +#include <fcntl.h> /* Caching mode. */ extern enum cache_mode { @@ -48,6 +43,13 @@ extern enum cache_mode { CACHE_MODE_UNSAFE, } cache_mode; +/* Size of a block in the cache. */ +extern unsigned blksize; + +/* Maximum size of the cache and high/low thresholds. */ +extern int64_t max_size; +extern int hi_thresh, lo_thresh; + /* Cache read requests. */ extern bool cache_on_read; diff --git a/filters/cache/reclaim.h b/filters/cache/reclaim.h new file mode 100644 index 0000000..d0692b6 --- /dev/null +++ b/filters/cache/reclaim.h @@ -0,0 +1,53 @@ +/* nbdkit + * Copyright (C) 2018-2019 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#ifndef NBDKIT_RECLAIM_H +#define NBDKIT_RECLAIM_H + +#include "bitmap.h" + +/* Do we support reclaiming cache blocks? */ +#ifdef FALLOC_FL_PUNCH_HOLE +#define HAVE_CACHE_RECLAIM 1 +#else +#undef HAVE_CACHE_RECLAIM +#endif + +/* Check if we need to reclaim blocks, and if so reclaim up to two + * blocks. + * + * Note this must be called with the blk lock held. + */ +extern void reclaim (int fd, struct bitmap *bm); + +#endif /* NBDKIT_RECLAIM_H */ diff --git a/filters/cache/blk.c b/filters/cache/blk.c index b256446..34f1745 100644 --- a/filters/cache/blk.c +++ b/filters/cache/blk.c @@ -46,14 +46,17 @@ #include <unistd.h> #include <fcntl.h> #include <errno.h> +#include <sys/statvfs.h> #include <nbdkit-filter.h> #include "bitmap.h" +#include "minmax.h" #include "cache.h" #include "blk.h" #include "lru.h" +#include "reclaim.h" /* The cache. */ static int fd = -1; @@ -69,7 +72,7 @@ static int fd = -1; static struct bitmap bm; enum bm_entry { - BLOCK_NOT_CACHED = 0, + BLOCK_NOT_CACHED = 0, /* assumed to be zero by reclaim code */ BLOCK_CLEAN = 1, BLOCK_DIRTY = 3, }; @@ -80,10 +83,7 @@ blk_init (void) const char *tmpdir; size_t len; char *template; - - lru_init (); - - bitmap_init (&bm, BLKSIZE, 2 /* bits per block */); + struct statvfs statvfs; tmpdir = getenv ("TMPDIR"); if (!tmpdir) @@ -108,6 +108,24 @@ blk_init (void) unlink (template); + /* Choose the block size. + * + * A 4K block size means that we need 64 MB of memory to store the + * bitmaps for a 1 TB underlying image. However to support + * hole-punching (for reclaiming) we need the block size to be at + * least as large as the filesystem block size. + */ + if (fstatvfs (fd, &statvfs) == -1) { + nbdkit_error ("fstatvfs: %s: %m", tmpdir); + return -1; + } + blksize = MAX (4096, statvfs.f_bsize); + nbdkit_debug ("cache: block size: %u", blksize); + + bitmap_init (&bm, blksize, 2 /* bits per block */); + + lru_init (); + return 0; } @@ -143,9 +161,11 @@ int blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, uint8_t *block, int *err) { - off_t offset = blknum * BLKSIZE; + off_t offset = blknum * blksize; enum bm_entry state = bitmap_get_blk (&bm, blknum, BLOCK_NOT_CACHED); + reclaim (fd, &bm); + nbdkit_debug ("cache: blk_read block %" PRIu64 " (offset %" PRIu64 ") is %s", blknum, (uint64_t) offset, state == BLOCK_NOT_CACHED ? "not cached" : @@ -154,18 +174,18 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, "unknown"); if (state == BLOCK_NOT_CACHED) { /* Read underlying plugin. */ - if (next_ops->pread (nxdata, block, BLKSIZE, offset, 0, err) == -1) + if (next_ops->pread (nxdata, block, blksize, offset, 0, err) == -1) return -1; /* If cache-on-read, copy the block to the cache. */ if (cache_on_read) { - off_t offset = blknum * BLKSIZE; + off_t offset = blknum * blksize; nbdkit_debug ("cache: cache-on-read block %" PRIu64 " (offset %" PRIu64 ")", blknum, (uint64_t) offset); - if (pwrite (fd, block, BLKSIZE, offset) == -1) { + if (pwrite (fd, block, blksize, offset) == -1) { *err = errno; nbdkit_error ("pwrite: %m"); return -1; @@ -176,7 +196,7 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } else { /* Read cache. */ - if (pread (fd, block, BLKSIZE, offset) == -1) { + if (pread (fd, block, blksize, offset) == -1) { *err = errno; nbdkit_error ("pread: %m"); return -1; @@ -191,18 +211,20 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, const uint8_t *block, uint32_t flags, int *err) { - off_t offset = blknum * BLKSIZE; + off_t offset = blknum * blksize; + + reclaim (fd, &bm); nbdkit_debug ("cache: writethrough block %" PRIu64 " (offset %" PRIu64 ")", blknum, (uint64_t) offset); - if (pwrite (fd, block, BLKSIZE, offset) == -1) { + if (pwrite (fd, block, blksize, offset) == -1) { *err = errno; nbdkit_error ("pwrite: %m"); return -1; } - if (next_ops->pwrite (nxdata, block, BLKSIZE, offset, flags, err) == -1) + if (next_ops->pwrite (nxdata, block, blksize, offset, flags, err) == -1) return -1; bitmap_set_blk (&bm, blknum, BLOCK_CLEAN); @@ -222,12 +244,14 @@ blk_write (struct nbdkit_next_ops *next_ops, void *nxdata, (cache_mode == CACHE_MODE_WRITEBACK && (flags & NBDKIT_FLAG_FUA))) return blk_writethrough (next_ops, nxdata, blknum, block, flags, err); - offset = blknum * BLKSIZE; + offset = blknum * blksize; + + reclaim (fd, &bm); nbdkit_debug ("cache: writeback block %" PRIu64 " (offset %" PRIu64 ")", blknum, (uint64_t) offset); - if (pwrite (fd, block, BLKSIZE, offset) == -1) { + if (pwrite (fd, block, blksize, offset) == -1) { *err = errno; nbdkit_error ("pwrite: %m"); return -1; diff --git a/filters/cache/cache.c b/filters/cache/cache.c index 86d66bb..7be363f 100644 --- a/filters/cache/cache.c +++ b/filters/cache/cache.c @@ -56,6 +56,7 @@ #include "cache.h" #include "blk.h" +#include "reclaim.h" #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL @@ -64,7 +65,10 @@ */ static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; +unsigned blksize; enum cache_mode cache_mode = CACHE_MODE_WRITEBACK; +int64_t max_size = -1; +int hi_thresh = 95, lo_thresh = 80; bool cache_on_read = false; static int cache_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t flags, int *err); @@ -105,6 +109,53 @@ cache_config (nbdkit_next_config *next, void *nxdata, return -1; } } +#ifdef HAVE_CACHE_RECLAIM + else if (strcmp (key, "cache-max-size") == 0) { + int64_t r; + + r = nbdkit_parse_size (value); + if (r == -1) + return -1; + /* We set a lower limit for the cache size just to keep out of + * trouble. + */ + if (r < 1024*1024) { + nbdkit_error ("cache-max-size is too small"); + return -1; + } + max_size = r; + return 0; + } + else if (strcmp (key, "cache-high-threshold") == 0) { + if (sscanf (value, "%d", &hi_thresh) != 1) { + nbdkit_error ("invalid cache-high-threshold parameter: %s", value); + return -1; + } + if (hi_thresh <= 0) { + nbdkit_error ("cache-high-threshold must be greater than zero"); + return -1; + } + return 0; + } + else if (strcmp (key, "cache-low-threshold") == 0) { + if (sscanf (value, "%d", &lo_thresh) != 1) { + nbdkit_error ("invalid cache-low-threshold parameter: %s", value); + return -1; + } + if (lo_thresh <= 0) { + nbdkit_error ("cache-low-threshold must be greater than zero"); + return -1; + } + return 0; + } +#else /* !HAVE_CACHE_RECLAIM */ + else if (strcmp (key, "cache-max-size") == 0 || + strcmp (key, "cache-high-threshold") == 0 || + strcmp (key, "cache-low-threshold") == 0) { + nbdkit_error ("this platform does not support cache reclaim"); + return -1; + } +#endif /* !HAVE_CACHE_RECLAIM */ else if (strcmp (key, "cache-on-read") == 0) { int r; @@ -119,6 +170,21 @@ cache_config (nbdkit_next_config *next, void *nxdata, } } +static int +cache_config_complete (nbdkit_next_config_complete *next, void *nxdata) +{ + /* If cache-max-size was set then check the thresholds. */ + if (max_size != -1) { + if (lo_thresh >= hi_thresh) { + nbdkit_error ("cache-low-threshold must be " + "less than cache-high-threshold"); + return -1; + } + } + + return next (nxdata); +} + static void * cache_open (nbdkit_next_open *next, void *nxdata, int readonly) { @@ -176,7 +242,7 @@ cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, uint8_t *block; assert (!flags); - block = malloc (BLKSIZE); + block = malloc (blksize); if (block == NULL) { *err = errno; nbdkit_error ("malloc: %m"); @@ -187,9 +253,9 @@ cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, blkoffs, n; int r; - blknum = offset / BLKSIZE; /* block number */ - blkoffs = offset % BLKSIZE; /* offset within the block */ - n = BLKSIZE - blkoffs; /* max bytes we can read from this block */ + blknum = offset / blksize; /* block number */ + blkoffs = offset % blksize; /* offset within the block */ + n = blksize - blkoffs; /* max bytes we can read from this block */ if (n > count) n = count; @@ -221,7 +287,7 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, uint8_t *block; bool need_flush = false; - block = malloc (BLKSIZE); + block = malloc (blksize); if (block == NULL) { *err = errno; nbdkit_error ("malloc: %m"); @@ -237,9 +303,9 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, blkoffs, n; int r; - blknum = offset / BLKSIZE; /* block number */ - blkoffs = offset % BLKSIZE; /* offset within the block */ - n = BLKSIZE - blkoffs; /* max bytes we can read from this block */ + blknum = offset / blksize; /* block number */ + blkoffs = offset % blksize; /* offset within the block */ + n = blksize - blkoffs; /* max bytes we can read from this block */ if (n > count) n = count; @@ -278,14 +344,14 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, uint8_t *block; bool need_flush = false; - block = malloc (BLKSIZE); + block = malloc (blksize); if (block == NULL) { *err = errno; nbdkit_error ("malloc: %m"); return -1; } - flags &= ~NBDKIT_FLAG_MAY_TRIM; /* See BLKSIZE comment above. */ + flags &= ~NBDKIT_FLAG_MAY_TRIM; /* See blksize comment above. */ if ((flags & NBDKIT_FLAG_FUA) && next_ops->can_fua (nxdata) == NBDKIT_FUA_EMULATE) { flags &= ~NBDKIT_FLAG_FUA; @@ -295,9 +361,9 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, blkoffs, n; int r; - blknum = offset / BLKSIZE; /* block number */ - blkoffs = offset % BLKSIZE; /* offset within the block */ - n = BLKSIZE - blkoffs; /* max bytes we can read from this block */ + blknum = offset / blksize; /* block number */ + blkoffs = offset % blksize; /* offset within the block */ + n = blksize - blkoffs; /* max bytes we can read from this block */ if (n > count) n = count; @@ -351,7 +417,7 @@ cache_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, assert (!flags); /* Allocate the bounce buffer. */ - data.block = malloc (BLKSIZE); + data.block = malloc (blksize); if (data.block == NULL) { *err = errno; nbdkit_error ("malloc: %m"); @@ -411,6 +477,7 @@ static struct nbdkit_filter filter = { .load = cache_load, .unload = cache_unload, .config = cache_config, + .config_complete = cache_config_complete, .open = cache_open, .prepare = cache_prepare, .get_size = cache_get_size, diff --git a/filters/cache/lru.c b/filters/cache/lru.c index 9e20408..90352b5 100644 --- a/filters/cache/lru.c +++ b/filters/cache/lru.c @@ -90,8 +90,8 @@ static unsigned N = 100; void lru_init (void) { - bitmap_init (&bm[0], BLKSIZE, 1 /* bits per block */); - bitmap_init (&bm[1], BLKSIZE, 1 /* bits per block */); + bitmap_init (&bm[0], blksize, 1 /* bits per block */); + bitmap_init (&bm[1], blksize, 1 /* bits per block */); } void @@ -109,8 +109,11 @@ lru_set_size (uint64_t new_size) if (bitmap_resize (&bm[1], new_size) == -1) return -1; - /* XXX Choose this better. */ - N = MAX (new_size / BLKSIZE / 4, 100); + if (max_size != -1) + /* Make the threshold about 1/4 the maximum size of the cache. */ + N = MAX (max_size / blksize / 4, 100); + else + N = MAX (new_size / blksize / 4, 100); return 0; } diff --git a/filters/cache/reclaim.c b/filters/cache/reclaim.c new file mode 100644 index 0000000..9b1fac5 --- /dev/null +++ b/filters/cache/reclaim.c @@ -0,0 +1,207 @@ +/* nbdkit + * Copyright (C) 2018-2019 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdbool.h> +#include <inttypes.h> +#include <string.h> +#include <unistd.h> +#include <fcntl.h> +#include <sys/types.h> +#include <sys/stat.h> + +#include <nbdkit-filter.h> + +#include "bitmap.h" + +#include "cache.h" +#include "reclaim.h" +#include "lru.h" + +#ifndef HAVE_CACHE_RECLAIM + +void +reclaim (int fd, struct bitmap *bm) +{ + /* nothing */ +} + +#else /* HAVE_CACHE_RECLAIM */ + +/* If we are currently reclaiming blocks from the cache. + * + * The state machine starts in the NOT_RECLAIMING state. When the + * size of the cache exceeds the high threshold, we move to + * RECLAIMING_LRU. Once we have exhausted all LRU blocks, we move to + * RECLAIMING_ANY (reclaiming any blocks). + * + * If at any time the size of the cache goes below the low threshold + * we move back to the NOT_RECLAIMING state. + * + * A possible future enhancement is to add an extra state between LRU + * and ANY which reclaims blocks from lru.c:bm[1]. + * + * reclaim_blk is the last block that we looked at. + */ +enum reclaim_state { + NOT_RECLAIMING = 0, + RECLAIMING_LRU = 1, + RECLAIMING_ANY = 2, +}; + +static enum reclaim_state reclaiming = NOT_RECLAIMING; +static uint64_t reclaim_blk; + +static void reclaim_one (int fd, struct bitmap *bm); +static void reclaim_lru (int fd, struct bitmap *bm); +static void reclaim_any (int fd, struct bitmap *bm); +static void reclaim_block (int fd, struct bitmap *bm); + +void +reclaim (int fd, struct bitmap *bm) +{ + struct stat statbuf; + uint64_t cache_allocated; + + /* If the user didn't set cache-max-size, do nothing. */ + if (max_size == -1) return; + + /* Check the allocated size of the cache. */ + if (fstat (fd, &statbuf) == -1) { + nbdkit_debug ("cache: fstat: %m"); + return; + } + cache_allocated = statbuf.st_blocks * UINT64_C(512); + + if (reclaiming) { + /* Keep reclaiming until the cache size drops below the low threshold. */ + if (cache_allocated < max_size * lo_thresh / 100) { + nbdkit_debug ("cache: stop reclaiming"); + reclaiming = NOT_RECLAIMING; + return; + } + } + else { + if (cache_allocated < max_size * hi_thresh / 100) + return; + + /* Start reclaiming if the cache size goes over the high threshold. */ + nbdkit_debug ("cache: start reclaiming"); + reclaiming = RECLAIMING_LRU; + } + + /* Reclaim up to 2 cache blocks. */ + reclaim_one (fd, bm); + reclaim_one (fd, bm); +} + +/* Reclaim a single cache block. */ +static void +reclaim_one (int fd, struct bitmap *bm) +{ + assert (reclaiming); + + if (reclaiming == RECLAIMING_LRU) + reclaim_lru (fd, bm); + else + reclaim_any (fd, bm); +} + +static void +reclaim_lru (int fd, struct bitmap *bm) +{ + uint64_t old_reclaim_blk; + + /* Find the next block in the cache. */ + reclaim_blk = bitmap_next (bm, reclaim_blk+1); + old_reclaim_blk = reclaim_blk; + + /* Search for an LRU block after this one. */ + do { + if (! lru_has_been_recently_accessed (reclaim_blk)) { + reclaim_block (fd, bm); + return; + } + + reclaim_blk = bitmap_next (bm, reclaim_blk+1); + if (reclaim_blk == -1) /* wrap around */ + reclaim_blk = bitmap_next (bm, 0); + } while (reclaim_blk >= 0 && old_reclaim_blk != reclaim_blk); + + if (old_reclaim_blk == reclaim_blk) { + /* Run out of LRU blocks, so start reclaiming any block in the cache. */ + nbdkit_debug ("cache: reclaiming any blocks"); + reclaiming = RECLAIMING_ANY; + reclaim_any (fd, bm); + } +} + +static void +reclaim_any (int fd, struct bitmap *bm) +{ + /* Find the next block in the cache. */ + reclaim_blk = bitmap_next (bm, reclaim_blk+1); + if (reclaim_blk == -1) /* wrap around */ + reclaim_blk = bitmap_next (bm, 0); + + reclaim_block (fd, bm); +} + +static void +reclaim_block (int fd, struct bitmap *bm) +{ + if (reclaim_blk == -1) { + nbdkit_debug ("cache: run out of blocks to reclaim!"); + return; + } + + nbdkit_debug ("cache: reclaiming block %" PRIu64, reclaim_blk); +#ifdef FALLOC_FL_PUNCH_HOLE + if (fallocate (fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE, + reclaim_blk * blksize, blksize) == -1) { + nbdkit_error ("cache: reclaiming cache blocks: " + "fallocate: FALLOC_FL_PUNCH_HOLE: %m"); + return; + } +#else +#error "no implementation for punching holes" +#endif + + bitmap_set_blk (bm, reclaim_blk, 0); +} + +#endif /* HAVE_CACHE_RECLAIM */ diff --git a/TODO b/TODO index 9d6e727..7107ea9 100644 --- a/TODO +++ b/TODO @@ -100,15 +100,8 @@ Suggestions for filters * masking plugin features for testing clients (see 'nozero' and 'fua' filters for examples) -nbdkit-cache-filter needs considerable work: - -* allow the user to limit the total size of the cache - -* handle ENOSPC errors - -* implement some cache replacement policies - -* some sort of background task or thread to write out dirty blocks +* nbdkit-cache-filter should handle ENOSPC errors automatically by + reclaiming blocks from the cache Composing nbdkit ---------------- diff --git a/filters/cache/Makefile.am b/filters/cache/Makefile.am index 1620fa1..1d167aa 100644 --- a/filters/cache/Makefile.am +++ b/filters/cache/Makefile.am @@ -43,6 +43,8 @@ nbdkit_cache_filter_la_SOURCES = \ cache.h \ lru.c \ lru.h \ + reclaim.c \ + reclaim.h \ $(top_srcdir)/include/nbdkit-filter.h nbdkit_cache_filter_la_CPPFLAGS = \ diff --git a/tests/Makefile.am b/tests/Makefile.am index 03cd206..876a5fc 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -42,6 +42,7 @@ EXTRA_DIST = \ shebang.rb \ test-blocksize.sh \ test-cache.sh \ + test-cache-max-size.sh \ test-cache-on-read.sh \ test-captive.sh \ test-cow.sh \ @@ -711,6 +712,7 @@ TESTS += \ test-cache.sh \ test-cache-on-read.sh endif HAVE_GUESTFISH +TESTS += test-cache-max-size.sh # cow filter test. if HAVE_GUESTFISH diff --git a/tests/test-cache-max-size.sh b/tests/test-cache-max-size.sh new file mode 100755 index 0000000..f7e743a --- /dev/null +++ b/tests/test-cache-max-size.sh @@ -0,0 +1,96 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +source ./functions.sh +set -e +set -x + +# Check that this is a Linux-like system supporting /proc/$pid/fd. +if ! test -d /proc/self/fd; then + echo "$0: not a Linux-like system supporting /proc/\$pid/fd" + exit 77 +fi + +# Test that qemu-io works +if ! qemu-io --help >/dev/null; then + echo "$0: missing or broken qemu-io" + exit 77 +fi + +# Need the stat command from coreutils. +if ! stat --version >/dev/null; then + echo "$0: missing or broken stat command" + exit 77 +fi + +d=cache-max-size.d +rm -rf $d +mkdir -p $d +cleanup_fn rm -rf $d + +# Create a cache directory. +mkdir $d/cache +TMPDIR=$d/cache +export TMPDIR + +# Create an empty base image. +truncate -s 1G $d/cache-max-size.img + +# Run nbdkit with the caching filter and a low size limit to ensure +# that the reclaim code is exercised. +start_nbdkit -P $d/cache-max-size.pid -U $d/cache-max-size.sock \ + --filter=cache \ + file $d/cache-max-size.img \ + cache-max-size=10M cache-on-read=true + +# Write > 10M to the plugin. +qemu-io -f raw "nbd+unix://?socket=$d/cache-max-size.sock" \ + -c "w -P 0 0 10M" \ + -c "w -P 0 20M 10M" \ + -c "r 20M 10M" \ + -c "r 30M 10M" \ + -c "w -P 0 40M 10M" + +# We can't use ‘du’ on the cache directory because the cache file is +# deleted by the filter, and so is only accessible via /proc/$pid/fd. +# Get the /proc link to the cache file, and the size of it in bytes. +fddir="/proc/$( cat $d/cache-max-size.pid )/fd" +ls -l $fddir +fd="$fddir/$( ls -l $fddir | grep $TMPDIR | head -1 | awk '{print $9}' )" +stat -L $fd +size=$(( $(stat -L -c '%b * %B' $fd) )) + +if [ "$size" -gt $(( 11 * 1024 * 1024 )) ]; then + echo "$0: cache size is larger than 10M (actual size: $size bytes)" + exit 1 +fi -- 2.19.2
Eric Blake
2019-Jan-04 14:51 UTC
Re: [Libguestfs] [PATCH nbdkit v5 2/3] common/include: Add generic MIN and MAX macros.
On 1/4/19 3:48 AM, Richard W.M. Jones wrote:> The preferred implementation uses __auto_type, a GCC extension also > now supported by Clang. > > Unfortunately OpenBSD ships with GCC 4.2.1 (from 2007!) which predates > this extension by quite a few years, so we have to be able to fall > back to a plain macro.Not quite. A quick google search found that at least gcc 3.3.6 supported typeof, and therefore 4.2.1 does as well (I'm not sure how far back that extension goes - it's surprisingly hard to google for which gcc version introduced a specific extension, and I merely got lucky that my google search hit a copy of the 3.3.6 docs. I suppose a crawl through gcc.git history is going to be the most accurate way to find when which extensions were added). At any rate, typeof is better than the naive plain macro. But you ARE right that __auto_type is nicer than typeof (smaller macro expansion, and deals with variably modified types correctly - although we don't use those to worry about that latter point).> +++ b/configure.ac > @@ -156,9 +156,27 @@ you may be using a C compiler which does not support this attribute, > or the configure test may be wrong. > > This code requires the attribute to work for proper locking between threads.])]) > -dnl restore CFLAGS > CFLAGS="${acx_nbdkit_save_CFLAGS}" > > +dnl Check for __auto_type (GCC extension). > +AC_MSG_CHECKING([if __auto_type is available in this compiler]) > +AC_COMPILE_IFELSE([ > +AC_LANG_SOURCE([[ > +static int > +test (int a) > +{ > + __auto_type at = a; > + return at; > +} > +]]) > + ],[ > + AC_MSG_RESULT([yes]) > + AC_DEFINE([HAVE_AUTO_TYPE],[1],[__auto_type is available]) > + ],[ > + AC_MSG_RESULT([no])If desired, we could also make the negative branch test that typeof is supported, rather than blindly using it - then again; we rely on enough other gcc extensions (and mandate at least automatic cleanup support) that we are probably safe assuming that typeof works if __auto_type is not present.> +#ifdef HAVE_AUTO_TYPE > + > +/* __auto_type is a GCC extension, added in GCC 4.9 in 2014, and to > + * Clang in 2015. > + */ > + > +#define MIN(x, y) ({ \ > + __auto_type _x = (x); \ > + __auto_type _y = (y); \ > + _x < _y ? _x : _y; \ > + }) > +#define MAX(x, y) ({ \ > + __auto_type _x = (x); \ > + __auto_type _y = (y); \ > + _x > _y ? _x : _y; \ > + }) > + > +#else > + > +#define MIN(x, y) ((x) < (y) ? (x) : (y)) > +#define MAX(x, y) ((x) > (y) ? (x) : (y))So I'd replace these with: #define MIN(x, y) ({ \ typeof (a) _a = (a); \ typeof (b) _b = (b); \ _a < _b ? _a : _b; \ }) and similar for MAX. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jan-04 15:05 UTC
Re: [Libguestfs] [PATCH nbdkit v5 3/3] cache: Implement cache-max-size and cache space reclaim.
On 1/4/19 3:48 AM, Richard W.M. Jones wrote:> The original plan was to have a background thread doing the reclaim. > However that cannot work given the design of filters, because a > background thread cannot access the next_ops struct which is only > available during requests. > > Therefore we spread the work over the request threads. Each blk_* > function checks whether there is work to do, and if there is will > reclaim up to two blocks from the cache (thus ensuring that we always > make forward progress, since each blk_* function can only add at most > one block to the cache). > > Another large change is that the cache block size can no longer be > fixed. We must use a block size which is at least as large as the > filesystem block size so that FALLOC_FL_PUNCH_HOLE works. To do this, > test the filesystem block size and set blksize dynamically to > MAX (4096, filesystem block size). > > This also adds a testIncomplete sentence?> ---> +++ b/filters/cache/cache.h > @@ -34,12 +34,7 @@ > #ifndef NBDKIT_CACHE_H > #define NBDKIT_CACHE_H > > -/* Size of a block in the cache. A 4K block size means that we need > - * 64 MB of memory to store the bitmaps for a 1 TB underlying image. > - * It is also smaller than the usual hole size for sparse files, which > - * means we have no reason to call next_ops->zero. > - */ > -#define BLKSIZE 4096This comment is now gone...> @@ -278,14 +344,14 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > uint8_t *block; > bool need_flush = false; > > - block = malloc (BLKSIZE); > + block = malloc (blksize); > if (block == NULL) { > *err = errno; > nbdkit_error ("malloc: %m"); > return -1; > } > > - flags &= ~NBDKIT_FLAG_MAY_TRIM; /* See BLKSIZE comment above. */ > + flags &= ~NBDKIT_FLAG_MAY_TRIM; /* See blksize comment above. */...which makes this comment stale. What's more, if we are able to punch holes, maybe we should consider using a fourth mode in our bitmap. Right now we have 00 for uncached, 01 for clean, 11 for dirty (and allocated); we could add 10 for dirty-but-known-zero by punching a hole locally at the time of the client's NBD_CMD_WRITE_ZERO with MAY_TRIM, then later calling into next_ops->zero() when flushing a known-zero back to the underlying plugin, all since we are now sized for holes. But other than cleaning up the comment, the rest of this paragraph deserves a separate commit. I'm not spotting any obvious problems now. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Apparently Analagous Threads
- [PATCH nbdkit] common/include: Add generic MIN and MAX macros.
- [PATCH nbdkit v5 3/3] cache: Implement cache-max-size and cache space reclaim.
- [nbdkit PATCH 3/3] filters: Use only .thread_model, not THREAD_MODEL
- Re: [nbdkit] [filter/nozero] large binary size with GCC 9
- [nbdkit PATCH v2 24/24] nocache: Implement new filter