Eric Blake
2019-Apr-27 21:26 UTC
[Libguestfs] [nbdkit PATCH 0/4] Fix truncate handling of real_size
While working on adding assertions to pthread_mutex_lock calls, I noticed that the truncate filter's use of mutex didn't really protect us, and isn't really necessary. Cleaning that up also spotted a couple of other potential cleanups. Eric Blake (4): filters: Drop useless .open callbacks truncate: Fix corruption when plugin changes per-connection size truncate: Test for safe multi-connect size handling doc: More details on (lack of) dynamic sizing docs/nbdkit-plugin.pod | 9 ++ filters/cache/cache.c | 10 --- filters/error/error.c | 10 --- filters/truncate/truncate.c | 162 +++++++++++++++++++----------------- tests/Makefile.am | 2 + tests/test-truncate4.sh | 85 +++++++++++++++++++ 6 files changed, 183 insertions(+), 95 deletions(-) create mode 100755 tests/test-truncate4.sh -- 2.20.1
Eric Blake
2019-Apr-27 21:26 UTC
[Libguestfs] [nbdkit PATCH 1/4] filters: Drop useless .open callbacks
The cache filter .open has never done any useful work; since its introduction in c10d126f, it appears to exist purely as copy-and-paste from the cow filter (differing only in whether the readonly parameter is massaged before passing on to next). The error filter .open used to track a per-connection handle, but for testing purposes, it was changed to use only global state in commit b33ccbb8. Drop these two .open callbacks in favor of using nbdkit's default behavior, with no semantic change. All other filters with an .open callback either massage parameters passed to next, create a per-connection handle, or both. Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/cache/cache.c | 10 ---------- filters/error/error.c | 10 ---------- 2 files changed, 20 deletions(-) diff --git a/filters/cache/cache.c b/filters/cache/cache.c index b3fef42..19ce555 100644 --- a/filters/cache/cache.c +++ b/filters/cache/cache.c @@ -186,15 +186,6 @@ cache_config_complete (nbdkit_next_config_complete *next, void *nxdata) return next (nxdata); } -static void * -cache_open (nbdkit_next_open *next, void *nxdata, int readonly) -{ - if (next (nxdata, readonly) == -1) - return NULL; - - return NBDKIT_HANDLE_NOT_NEEDED; -} - /* Get the file size and ensure the cache is the correct size. */ static int64_t cache_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -476,7 +467,6 @@ static struct nbdkit_filter filter = { .unload = cache_unload, .config = cache_config, .config_complete = cache_config_complete, - .open = cache_open, .prepare = cache_prepare, .get_size = cache_get_size, .pread = cache_pread, diff --git a/filters/error/error.c b/filters/error/error.c index add7566..8932292 100644 --- a/filters/error/error.c +++ b/filters/error/error.c @@ -252,15 +252,6 @@ error_config (nbdkit_next_config *next, void *nxdata, "error-pread*, error-pwrite*, error-trim*, error-zero*, error-extents*\n" \ " Apply settings only to read/write/etc" -static void * -error_open (nbdkit_next_open *next, void *nxdata, int readonly) -{ - if (next (nxdata, readonly) == -1) - return NULL; - - return NBDKIT_HANDLE_NOT_NEEDED; -} - /* This function injects a random error. */ static bool random_error (const struct error_settings *error_settings, @@ -366,7 +357,6 @@ static struct nbdkit_filter filter = { .unload = error_unload, .config = error_config, .config_help = error_config_help, - .open = error_open, .pread = error_pread, .pwrite = error_pwrite, .trim = error_trim, -- 2.20.1
Eric Blake
2019-Apr-27 21:26 UTC
[Libguestfs] [nbdkit PATCH 2/4] truncate: Fix corruption when plugin changes per-connection size
The truncate filter tried to be careful to lock access to setting or reading the real_size variable learned from calling next_ops->get_size, in anticipation of not behaving incorrectly if the NBD protocol makes dynamic resize supported, and where the global variable could serve as a cache rather than having to always call next_ops->get_size to recompute things. However, nbdkit-plugin.pod already documents that .get_size and other option negotiation callbacks may return different values on a per-connection basis, but that within a connection those results should be constant because they may be cached. And our lock does not prevent the following race: thread A thread B .prepare - lock - next_ops->get_size returns A - set real_size based on A - unlock .pread .prepare - lock - next_ops->get_size returns B - set real_size based on B - unlock - lock - set real_size_copy based on B - unlock - decide whether to call next_ops->pread using wrong real_size If we are worried that next_ops->get_size() can change (even if our current documentation says it should not), then we must call it every time, rather than relying on an older cached version of the size; conversely, if we are trying to cache things, then we are better off caching it in a per-connection handle instead of globally between connections. Until the NBD protocol provides a way to advertise new sizes to clients, then our per-connection handle is effectively readonly after .prepare, so we don't even have to worry about locks (and if we DO want correct locking, it would be better to have a rwlock with get_size holding write and all other ops holding a read for the duration of the call, rather than just for the moment it copies from a global variable). The next commit abuses the file plugin to demonstrate the above race where a global size cache changed by connection B can affect the behavior seen by connection A. Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/truncate/truncate.c | 162 +++++++++++++++++++----------------- 1 file changed, 87 insertions(+), 75 deletions(-) diff --git a/filters/truncate/truncate.c b/filters/truncate/truncate.c index 5f3370d..dfa9105 100644 --- a/filters/truncate/truncate.c +++ b/filters/truncate/truncate.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -54,15 +54,6 @@ static int64_t truncate_size = -1; static unsigned round_up = 0, round_down = 0; -/* The real size of the underlying plugin. */ -static uint64_t real_size; - -/* The calculated size after applying the parameters. */ -static uint64_t size; - -/* This lock protects the real_size and size fields. */ -static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; - static int parse_round_param (const char *key, const char *value, unsigned *ret) { @@ -121,51 +112,90 @@ truncate_config (nbdkit_next_config *next, void *nxdata, "round-up=<N> Round up to next multiple of N.\n" \ "round-down=<N> Round down to multiple of N." -static int64_t truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); +/* Per-connection state. Until the NBD protocol gains dynamic resize + * support, each connection remembers the size of the underlying + * plugin at open (even if that size differs between connections + * because the plugin tracks external resize effects). + */ +struct handle { + /* The real size of the underlying plugin. */ + uint64_t real_size; -/* In prepare, force a call to get_size which sets the real_size & size - * globals. + /* The calculated size after applying the parameters. */ + uint64_t size; +}; + +/* Open a connection. */ +static void * +truncate_open (nbdkit_next_open *next, void *nxdata, int readonly) +{ + struct handle *h; + + if (next (nxdata, readonly) == -1) + return NULL; + + h = malloc (sizeof *h); /* h is populated during .prepare */ + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + return h; +} + +static void +truncate_close (void *handle) +{ + struct handle *h = handle; + + free (h); +} + +/* In prepare, force a call to next_ops->get_size in order to set + * per-connection real_size & size; these values are not changed + * during the life of the connection. */ static int truncate_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { int64_t r; - - r = truncate_get_size (next_ops, nxdata, handle); - return r >= 0 ? 0 : -1; -} - -/* Get the size. As a side effect, calculate the size to serve. */ -static int64_t -truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, - void *handle) -{ - int64_t r, ret; + struct handle *h = handle; r = next_ops->get_size (nxdata); if (r == -1) return -1; - pthread_mutex_lock (&lock); - - real_size = size = r; + h->real_size = h->size = r; /* The truncate, round-up and round-down parameters are treated as * separate operations. It's possible to specify more than one, * although perhaps not very useful. */ if (truncate_size >= 0) - size = truncate_size; + h->size = truncate_size; if (round_up > 0) - size = ROUND_UP (size, round_up); + h->size = ROUND_UP (h->size, round_up); if (round_down > 0) - size = ROUND_DOWN (size, round_down); - ret = size; + h->size = ROUND_DOWN (h->size, round_down); - pthread_mutex_unlock (&lock); + return r >= 0 ? 0 : -1; +} - return ret; +/* Get the size. As a side effect, calculate the size to serve. */ +static int64_t +truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + struct handle *h = handle; + + /* If the NBD protocol and nbdkit adds dynamic resize, we'll need a + * rwlock where get_size holds write lock and all other ops hold + * read lock. Until then, NBD sizes are unchanging (even if the + * underlying plugin can react to external size changes), so just + * returned what we cached at connection open. + */ + return h->size; } /* Read data. */ @@ -176,17 +206,13 @@ truncate_pread (struct nbdkit_next_ops *next_ops, void *nxdata, { int r; uint32_t n; - uint64_t real_size_copy; + struct handle *h = handle; - pthread_mutex_lock (&lock); - real_size_copy = real_size; - pthread_mutex_unlock (&lock); - - if (offset < real_size_copy) { - if (offset + count <= real_size_copy) + if (offset < h->real_size) { + if (offset + count <= h->real_size) n = count; else - n = real_size_copy - offset; + n = h->real_size - offset; r = next_ops->pread (nxdata, buf, n, offset, flags, err); if (r == -1) return -1; @@ -209,17 +235,13 @@ truncate_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, { int r; uint32_t n; - uint64_t real_size_copy; + struct handle *h = handle; - pthread_mutex_lock (&lock); - real_size_copy = real_size; - pthread_mutex_unlock (&lock); - - if (offset < real_size_copy) { - if (offset + count <= real_size_copy) + if (offset < h->real_size) { + if (offset + count <= h->real_size) n = count; else - n = real_size_copy - offset; + n = h->real_size - offset; r = next_ops->pwrite (nxdata, buf, n, offset, flags, err); if (r == -1) return -1; @@ -246,17 +268,13 @@ truncate_trim (struct nbdkit_next_ops *next_ops, void *nxdata, uint32_t flags, int *err) { uint32_t n; - uint64_t real_size_copy; + struct handle *h = handle; - pthread_mutex_lock (&lock); - real_size_copy = real_size; - pthread_mutex_unlock (&lock); - - if (offset < real_size_copy) { - if (offset + count <= real_size_copy) + if (offset < h->real_size) { + if (offset + count <= h->real_size) n = count; else - n = real_size_copy - offset; + n = h->real_size - offset; return next_ops->trim (nxdata, n, offset, flags, err); } return 0; @@ -269,17 +287,13 @@ truncate_zero (struct nbdkit_next_ops *next_ops, void *nxdata, uint32_t flags, int *err) { uint32_t n; - uint64_t real_size_copy; + struct handle *h = handle; - pthread_mutex_lock (&lock); - real_size_copy = real_size; - pthread_mutex_unlock (&lock); - - if (offset < real_size_copy) { - if (offset + count <= real_size_copy) + if (offset < h->real_size) { + if (offset + count <= h->real_size) n = count; else - n = real_size_copy - offset; + n = h->real_size - offset; return next_ops->zero (nxdata, n, offset, flags, err); } return 0; @@ -292,21 +306,17 @@ truncate_extents (struct nbdkit_next_ops *next_ops, void *nxdata, uint32_t flags, struct nbdkit_extents *extents, int *err) { uint32_t n; - uint64_t real_size_copy; + struct handle *h = handle; CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents2 = NULL; size_t i; - pthread_mutex_lock (&lock); - real_size_copy = real_size; - pthread_mutex_unlock (&lock); - /* If the entire request is beyond the end of the underlying plugin * then this is the easy case: return a hole up to the end of the * file. */ - if (offset >= real_size_copy) { + if (offset >= h->real_size) { int r = nbdkit_add_extent (extents, - real_size_copy, truncate_size - real_size_copy, + h->real_size, truncate_size - h->real_size, NBDKIT_EXTENT_ZERO|NBDKIT_EXTENT_HOLE); if (r == -1) *err = errno; @@ -322,15 +332,15 @@ truncate_extents (struct nbdkit_next_ops *next_ops, void *nxdata, * have to create a new extents array, ask the plugin, then copy the * returned data to the original array. */ - extents2 = nbdkit_extents_new (offset, real_size_copy); + extents2 = nbdkit_extents_new (offset, h->real_size); if (extents2 == NULL) { *err = errno; return -1; } - if (offset + count <= real_size_copy) + if (offset + count <= h->real_size) n = count; else - n = real_size_copy - offset; + n = h->real_size - offset; if (next_ops->extents (nxdata, n, offset, flags, extents2, err) == -1) return -1; @@ -352,6 +362,8 @@ static struct nbdkit_filter filter = { .version = PACKAGE_VERSION, .config = truncate_config, .config_help = truncate_config_help, + .open = truncate_open, + .close = truncate_close, .prepare = truncate_prepare, .get_size = truncate_get_size, .pread = truncate_pread, -- 2.20.1
Eric Blake
2019-Apr-27 21:26 UTC
[Libguestfs] [nbdkit PATCH 3/4] truncate: Test for safe multi-connect size handling
Test the just-fixed bug in the truncate filter. Although it is in general unsafe to shrink an image being served by the file plugin, the test is careful to only issue operations on connection A after the file has been resized back to its original length, in order to prove whether the smaller size observed during the short-lived connection B bleeds over into the pread results seen by connection A. If this test is applied without the previous patch, the second read from A sees all zeroes instead of the desired pattern 2. Signed-off-by: Eric Blake <eblake@redhat.com> --- tests/Makefile.am | 2 + tests/test-truncate4.sh | 85 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) create mode 100755 tests/test-truncate4.sh diff --git a/tests/Makefile.am b/tests/Makefile.am index 4b7aa22..4148793 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -122,6 +122,7 @@ EXTRA_DIST = \ test-truncate1.sh \ test-truncate2.sh \ test-truncate3.sh \ + test-truncate4.sh \ test-truncate-extents.sh \ test-vddk.sh \ test-vddk-real.sh \ @@ -937,6 +938,7 @@ TESTS += \ test-truncate1.sh \ test-truncate2.sh \ test-truncate3.sh \ + test-truncate4.sh \ test-truncate-extents.sh # xz filter test. diff --git a/tests/test-truncate4.sh b/tests/test-truncate4.sh new file mode 100755 index 0000000..e4be626 --- /dev/null +++ b/tests/test-truncate4.sh @@ -0,0 +1,85 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2019 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Regression test when next_ops->get_size changes between connections. +# For now, NBD does not support dynamic resize; but the file plugin +# reads size from the file system for each new connection, at which +# point the client remembers that size for the life of the connection. +# We are testing that connection A can still see the tail of a file, +# even when connection B is opened while the file was temporarily +# shorter (if the actions of connection B affect the size visible +# through connection A, we didn't isolate per-connection state). + +source ./functions.sh +set -e +set -x + +requires qemu-io --version + +sock=`mktemp -u` +files="truncate4.out truncate4.pid $sock truncate4.data" +rm -f $files +cleanup_fn rm -f $files + +# Initial file contents: 1k of pattern 1 +truncate -s 1024 truncate4.data +qemu-io -c 'w -P 1 0 1024' -f raw truncate4.data + +# Run nbdkit with file plugin and truncate filter in front. +start_nbdkit -P truncate4.pid -U $sock \ + --filter=truncate \ + file truncate4.data \ + round-up=1024 + +fail=0 +exec 4>&1 # Save original stdout +{ + exec 5>&1 >&4 # Save connection A, set stdout back to original + echo 'Reading from connection A, try 1' + echo 'r -P 1 0 1024' >&5 + sleep 1 + echo 'Resizing down' + truncate -s 512 truncate4.data + echo 'Reading from connection B' + echo 'r -P 1 0 512' | qemu-io -f raw nbd:unix:$sock >> truncate4.out + echo 'Restoring size' + truncate -s 1024 truncate4.data + qemu-io -c 'w -P 2 0 1024' -f raw truncate4.data + echo 'Reading from connection A, try 2' + echo 'r -P 2 512 512' >&5 + echo 'quit' >&5 +} | qemu-io -f raw nbd:unix:$sock >> truncate4.out || fail=1 +exec 4>&- + +cat truncate4.out +grep 'Pattern verification failed' truncate4.out && fail=1 +exit $fail -- 2.20.1
Eric Blake
2019-Apr-27 21:26 UTC
[Libguestfs] [nbdkit PATCH 4/4] doc: More details on (lack of) dynamic sizing
We already documented in the lifetime section that .get_size is in the same boat as .can_write and friends, where the values reported may change between connections, but should generally remain constant for a given connection and that other parts of the code may cache rather than constantly re-check the callback. However, it is worth a bit more detail directly on the .get_size callback, in particular mentioning that a server that allows externally growing images is generally safe. When the NBD protocol extension for dynamic resize is finally implemented, we'll probably have to add more rules on how and when size can change. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/nbdkit-plugin.pod | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index e9dc34f..15d571f 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -479,6 +479,15 @@ check for errors, but obviously this is outside the scope of nbdkit. This is called during the option negotiation phase of the protocol to get the size (in bytes) of the block device being exported. +The NBD protocol does not currently provide a way for a client to +learn about or request any dynamic size changes. Thus, although this +callback may return different values across different C<.open> calls, +other code in nbdkit may cache the size reported during open, rather +than checking for the current size. It is generally safe to report a +larger size than before (even if the client will not access beyond the +original size), but risky to shrink an image below the size in use by +any current connection. + The returned size must be E<ge> 0. If there is an error, C<.get_size> should call C<nbdkit_error> with an error message and return C<-1>. -- 2.20.1
Richard W.M. Jones
2019-Apr-29 12:23 UTC
Re: [Libguestfs] [nbdkit PATCH 1/4] filters: Drop useless .open callbacks
On Sat, Apr 27, 2019 at 04:26:43PM -0500, Eric Blake wrote:> The cache filter .open has never done any useful work; since its > introduction in c10d126f, it appears to exist purely as copy-and-paste > from the cow filter (differing only in whether the readonly parameter > is massaged before passing on to next). > > The error filter .open used to track a per-connection handle, but for > testing purposes, it was changed to use only global state in commit > b33ccbb8. > > Drop these two .open callbacks in favor of using nbdkit's default > behavior, with no semantic change. All other filters with an .open > callback either massage parameters passed to next, create a > per-connection handle, or both.Yes these are both bogus and removing them is correct, ACK. Rich.> Signed-off-by: Eric Blake <eblake@redhat.com> > --- > filters/cache/cache.c | 10 ---------- > filters/error/error.c | 10 ---------- > 2 files changed, 20 deletions(-) > > diff --git a/filters/cache/cache.c b/filters/cache/cache.c > index b3fef42..19ce555 100644 > --- a/filters/cache/cache.c > +++ b/filters/cache/cache.c > @@ -186,15 +186,6 @@ cache_config_complete (nbdkit_next_config_complete *next, void *nxdata) > return next (nxdata); > } > > -static void * > -cache_open (nbdkit_next_open *next, void *nxdata, int readonly) > -{ > - if (next (nxdata, readonly) == -1) > - return NULL; > - > - return NBDKIT_HANDLE_NOT_NEEDED; > -} > - > /* Get the file size and ensure the cache is the correct size. */ > static int64_t > cache_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, > @@ -476,7 +467,6 @@ static struct nbdkit_filter filter = { > .unload = cache_unload, > .config = cache_config, > .config_complete = cache_config_complete, > - .open = cache_open, > .prepare = cache_prepare, > .get_size = cache_get_size, > .pread = cache_pread, > diff --git a/filters/error/error.c b/filters/error/error.c > index add7566..8932292 100644 > --- a/filters/error/error.c > +++ b/filters/error/error.c > @@ -252,15 +252,6 @@ error_config (nbdkit_next_config *next, void *nxdata, > "error-pread*, error-pwrite*, error-trim*, error-zero*, error-extents*\n" \ > " Apply settings only to read/write/etc" > > -static void * > -error_open (nbdkit_next_open *next, void *nxdata, int readonly) > -{ > - if (next (nxdata, readonly) == -1) > - return NULL; > - > - return NBDKIT_HANDLE_NOT_NEEDED; > -} > - > /* This function injects a random error. */ > static bool > random_error (const struct error_settings *error_settings, > @@ -366,7 +357,6 @@ static struct nbdkit_filter filter = { > .unload = error_unload, > .config = error_config, > .config_help = error_config_help, > - .open = error_open, > .pread = error_pread, > .pwrite = error_pwrite, > .trim = error_trim, > -- > 2.20.1 > > _______________________________________________ > Libguestfs mailing list > Libguestfs@redhat.com > https://www.redhat.com/mailman/listinfo/libguestfs-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2019-Apr-29 12:27 UTC
Re: [Libguestfs] [nbdkit PATCH 2/4] truncate: Fix corruption when plugin changes per-connection size
On Sat, Apr 27, 2019 at 04:26:44PM -0500, Eric Blake wrote:> The truncate filter tried to be careful to lock access to setting or > reading the real_size variable learned from calling > next_ops->get_size, in anticipation of not behaving incorrectly if the > NBD protocol makes dynamic resize supported, and where the global > variable could serve as a cache rather than having to always call > next_ops->get_size to recompute things. However, nbdkit-plugin.pod > already documents that .get_size and other option negotiation > callbacks may return different values on a per-connection basis, but > that within a connection those results should be constant because they > may be cached. And our lock does not prevent the following race: > > thread A thread B > .prepare > - lock > - next_ops->get_size returns A > - set real_size based on A > - unlock > .pread > .prepare > - lock > - next_ops->get_size returns B > - set real_size based on B > - unlock > - lock > - set real_size_copy based on B > - unlock > - decide whether to call next_ops->pread using wrong real_size > > If we are worried that next_ops->get_size() can change (even if our > current documentation says it should not), then we must call it every > time, rather than relying on an older cached version of the size; > conversely, if we are trying to cache things, then we are better off > caching it in a per-connection handle instead of globally between > connections. Until the NBD protocol provides a way to advertise new > sizes to clients, then our per-connection handle is effectively > readonly after .prepare, so we don't even have to worry about locks > (and if we DO want correct locking, it would be better to have a > rwlock with get_size holding write and all other ops holding a read > for the duration of the call, rather than just for the moment it > copies from a global variable). > > The next commit abuses the file plugin to demonstrate the above race > where a global size cache changed by connection B can affect the > behavior seen by connection A. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > filters/truncate/truncate.c | 162 +++++++++++++++++++----------------- > 1 file changed, 87 insertions(+), 75 deletions(-) > > diff --git a/filters/truncate/truncate.c b/filters/truncate/truncate.c > index 5f3370d..dfa9105 100644 > --- a/filters/truncate/truncate.c > +++ b/filters/truncate/truncate.c > @@ -1,5 +1,5 @@ > /* nbdkit > - * Copyright (C) 2018 Red Hat Inc. > + * Copyright (C) 2018-2019 Red Hat Inc. > * > * Redistribution and use in source and binary forms, with or without > * modification, are permitted provided that the following conditions are > @@ -54,15 +54,6 @@ > static int64_t truncate_size = -1; > static unsigned round_up = 0, round_down = 0; > > -/* The real size of the underlying plugin. */ > -static uint64_t real_size; > - > -/* The calculated size after applying the parameters. */ > -static uint64_t size; > - > -/* This lock protects the real_size and size fields. */ > -static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; > - > static int > parse_round_param (const char *key, const char *value, unsigned *ret) > { > @@ -121,51 +112,90 @@ truncate_config (nbdkit_next_config *next, void *nxdata, > "round-up=<N> Round up to next multiple of N.\n" \ > "round-down=<N> Round down to multiple of N." > > -static int64_t truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); > +/* Per-connection state. Until the NBD protocol gains dynamic resize > + * support, each connection remembers the size of the underlying > + * plugin at open (even if that size differs between connections > + * because the plugin tracks external resize effects). > + */ > +struct handle { > + /* The real size of the underlying plugin. */ > + uint64_t real_size; > > -/* In prepare, force a call to get_size which sets the real_size & size > - * globals. > + /* The calculated size after applying the parameters. */ > + uint64_t size; > +}; > + > +/* Open a connection. */ > +static void * > +truncate_open (nbdkit_next_open *next, void *nxdata, int readonly) > +{ > + struct handle *h; > + > + if (next (nxdata, readonly) == -1) > + return NULL; > + > + h = malloc (sizeof *h); /* h is populated during .prepare */ > + if (h == NULL) { > + nbdkit_error ("malloc: %m"); > + return NULL; > + } > + > + return h; > +} > + > +static void > +truncate_close (void *handle) > +{ > + struct handle *h = handle; > + > + free (h); > +} > + > +/* In prepare, force a call to next_ops->get_size in order to set > + * per-connection real_size & size; these values are not changed > + * during the life of the connection. > */ > static int > truncate_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, > void *handle) > { > int64_t r; > - > - r = truncate_get_size (next_ops, nxdata, handle); > - return r >= 0 ? 0 : -1; > -} > - > -/* Get the size. As a side effect, calculate the size to serve. */ > -static int64_t > -truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, > - void *handle) > -{ > - int64_t r, ret; > + struct handle *h = handle; > > r = next_ops->get_size (nxdata); > if (r == -1) > return -1; > > - pthread_mutex_lock (&lock); > - > - real_size = size = r; > + h->real_size = h->size = r; > > /* The truncate, round-up and round-down parameters are treated as > * separate operations. It's possible to specify more than one, > * although perhaps not very useful. > */ > if (truncate_size >= 0) > - size = truncate_size; > + h->size = truncate_size; > if (round_up > 0) > - size = ROUND_UP (size, round_up); > + h->size = ROUND_UP (h->size, round_up); > if (round_down > 0) > - size = ROUND_DOWN (size, round_down); > - ret = size; > + h->size = ROUND_DOWN (h->size, round_down); > > - pthread_mutex_unlock (&lock); > + return r >= 0 ? 0 : -1; > +} > > - return ret; > +/* Get the size. As a side effect, calculate the size to serve. */ > +static int64_t > +truncate_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle) > +{ > + struct handle *h = handle; > + > + /* If the NBD protocol and nbdkit adds dynamic resize, we'll need a > + * rwlock where get_size holds write lock and all other ops hold > + * read lock. Until then, NBD sizes are unchanging (even if the > + * underlying plugin can react to external size changes), so just > + * returned what we cached at connection open. > + */ > + return h->size; > } > > /* Read data. */ > @@ -176,17 +206,13 @@ truncate_pread (struct nbdkit_next_ops *next_ops, void *nxdata, > { > int r; > uint32_t n; > - uint64_t real_size_copy; > + struct handle *h = handle; > > - pthread_mutex_lock (&lock); > - real_size_copy = real_size; > - pthread_mutex_unlock (&lock); > - > - if (offset < real_size_copy) { > - if (offset + count <= real_size_copy) > + if (offset < h->real_size) { > + if (offset + count <= h->real_size) > n = count; > else > - n = real_size_copy - offset; > + n = h->real_size - offset; > r = next_ops->pread (nxdata, buf, n, offset, flags, err); > if (r == -1) > return -1; > @@ -209,17 +235,13 @@ truncate_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, > { > int r; > uint32_t n; > - uint64_t real_size_copy; > + struct handle *h = handle; > > - pthread_mutex_lock (&lock); > - real_size_copy = real_size; > - pthread_mutex_unlock (&lock); > - > - if (offset < real_size_copy) { > - if (offset + count <= real_size_copy) > + if (offset < h->real_size) { > + if (offset + count <= h->real_size) > n = count; > else > - n = real_size_copy - offset; > + n = h->real_size - offset; > r = next_ops->pwrite (nxdata, buf, n, offset, flags, err); > if (r == -1) > return -1; > @@ -246,17 +268,13 @@ truncate_trim (struct nbdkit_next_ops *next_ops, void *nxdata, > uint32_t flags, int *err) > { > uint32_t n; > - uint64_t real_size_copy; > + struct handle *h = handle; > > - pthread_mutex_lock (&lock); > - real_size_copy = real_size; > - pthread_mutex_unlock (&lock); > - > - if (offset < real_size_copy) { > - if (offset + count <= real_size_copy) > + if (offset < h->real_size) { > + if (offset + count <= h->real_size) > n = count; > else > - n = real_size_copy - offset; > + n = h->real_size - offset; > return next_ops->trim (nxdata, n, offset, flags, err); > } > return 0; > @@ -269,17 +287,13 @@ truncate_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > uint32_t flags, int *err) > { > uint32_t n; > - uint64_t real_size_copy; > + struct handle *h = handle; > > - pthread_mutex_lock (&lock); > - real_size_copy = real_size; > - pthread_mutex_unlock (&lock); > - > - if (offset < real_size_copy) { > - if (offset + count <= real_size_copy) > + if (offset < h->real_size) { > + if (offset + count <= h->real_size) > n = count; > else > - n = real_size_copy - offset; > + n = h->real_size - offset; > return next_ops->zero (nxdata, n, offset, flags, err); > } > return 0; > @@ -292,21 +306,17 @@ truncate_extents (struct nbdkit_next_ops *next_ops, void *nxdata, > uint32_t flags, struct nbdkit_extents *extents, int *err) > { > uint32_t n; > - uint64_t real_size_copy; > + struct handle *h = handle; > CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents2 = NULL; > size_t i; > > - pthread_mutex_lock (&lock); > - real_size_copy = real_size; > - pthread_mutex_unlock (&lock); > - > /* If the entire request is beyond the end of the underlying plugin > * then this is the easy case: return a hole up to the end of the > * file. > */ > - if (offset >= real_size_copy) { > + if (offset >= h->real_size) { > int r = nbdkit_add_extent (extents, > - real_size_copy, truncate_size - real_size_copy, > + h->real_size, truncate_size - h->real_size, > NBDKIT_EXTENT_ZERO|NBDKIT_EXTENT_HOLE); > if (r == -1) > *err = errno; > @@ -322,15 +332,15 @@ truncate_extents (struct nbdkit_next_ops *next_ops, void *nxdata, > * have to create a new extents array, ask the plugin, then copy the > * returned data to the original array. > */ > - extents2 = nbdkit_extents_new (offset, real_size_copy); > + extents2 = nbdkit_extents_new (offset, h->real_size); > if (extents2 == NULL) { > *err = errno; > return -1; > } > - if (offset + count <= real_size_copy) > + if (offset + count <= h->real_size) > n = count; > else > - n = real_size_copy - offset; > + n = h->real_size - offset; > if (next_ops->extents (nxdata, n, offset, flags, extents2, err) == -1) > return -1; > > @@ -352,6 +362,8 @@ static struct nbdkit_filter filter = { > .version = PACKAGE_VERSION, > .config = truncate_config, > .config_help = truncate_config_help, > + .open = truncate_open, > + .close = truncate_close, > .prepare = truncate_prepare, > .get_size = truncate_get_size, > .pread = truncate_pread,Yes this patch is fine, ACK. We might be able to remove #include <pthread.h> too if nothing else uses pthread functions in the file. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2019-Apr-29 12:29 UTC
Re: [Libguestfs] [nbdkit PATCH 3/4] truncate: Test for safe multi-connect size handling
On Sat, Apr 27, 2019 at 04:26:45PM -0500, Eric Blake wrote:> Test the just-fixed bug in the truncate filter. Although it is in > general unsafe to shrink an image being served by the file plugin, the > test is careful to only issue operations on connection A after the > file has been resized back to its original length, in order to prove > whether the smaller size observed during the short-lived connection B > bleeds over into the pread results seen by connection A. If this test > is applied without the previous patch, the second read from A sees all > zeroes instead of the desired pattern 2. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > tests/Makefile.am | 2 + > tests/test-truncate4.sh | 85 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 87 insertions(+) > create mode 100755 tests/test-truncate4.sh > > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 4b7aa22..4148793 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -122,6 +122,7 @@ EXTRA_DIST = \ > test-truncate1.sh \ > test-truncate2.sh \ > test-truncate3.sh \ > + test-truncate4.sh \ > test-truncate-extents.sh \ > test-vddk.sh \ > test-vddk-real.sh \ > @@ -937,6 +938,7 @@ TESTS += \ > test-truncate1.sh \ > test-truncate2.sh \ > test-truncate3.sh \ > + test-truncate4.sh \ > test-truncate-extents.sh > > # xz filter test. > diff --git a/tests/test-truncate4.sh b/tests/test-truncate4.sh > new file mode 100755 > index 0000000..e4be626 > --- /dev/null > +++ b/tests/test-truncate4.sh > @@ -0,0 +1,85 @@ > +#!/usr/bin/env bash > +# nbdkit > +# Copyright (C) 2019 Red Hat Inc. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions are > +# met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in the > +# documentation and/or other materials provided with the distribution. > +# > +# * Neither the name of Red Hat nor the names of its contributors may be > +# used to endorse or promote products derived from this software without > +# specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND > +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR > +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, > +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > +# SUCH DAMAGE. > + > +# Regression test when next_ops->get_size changes between connections. > +# For now, NBD does not support dynamic resize; but the file plugin > +# reads size from the file system for each new connection, at which > +# point the client remembers that size for the life of the connection. > +# We are testing that connection A can still see the tail of a file, > +# even when connection B is opened while the file was temporarily > +# shorter (if the actions of connection B affect the size visible > +# through connection A, we didn't isolate per-connection state). > + > +source ./functions.sh > +set -e > +set -x > + > +requires qemu-io --version > + > +sock=`mktemp -u` > +files="truncate4.out truncate4.pid $sock truncate4.data" > +rm -f $files > +cleanup_fn rm -f $files > + > +# Initial file contents: 1k of pattern 1 > +truncate -s 1024 truncate4.data > +qemu-io -c 'w -P 1 0 1024' -f raw truncate4.data > + > +# Run nbdkit with file plugin and truncate filter in front. > +start_nbdkit -P truncate4.pid -U $sock \ > + --filter=truncate \ > + file truncate4.data \ > + round-up=1024 > + > +fail=0 > +exec 4>&1 # Save original stdout > +{ > + exec 5>&1 >&4 # Save connection A, set stdout back to original > + echo 'Reading from connection A, try 1' > + echo 'r -P 1 0 1024' >&5 > + sleep 1 > + echo 'Resizing down' > + truncate -s 512 truncate4.data > + echo 'Reading from connection B' > + echo 'r -P 1 0 512' | qemu-io -f raw nbd:unix:$sock >> truncate4.out > + echo 'Restoring size' > + truncate -s 1024 truncate4.data > + qemu-io -c 'w -P 2 0 1024' -f raw truncate4.data > + echo 'Reading from connection A, try 2' > + echo 'r -P 2 512 512' >&5 > + echo 'quit' >&5 > +} | qemu-io -f raw nbd:unix:$sock >> truncate4.out || fail=1 > +exec 4>&- > + > +cat truncate4.out > +grep 'Pattern verification failed' truncate4.out && fail=1 > +exit $failI guess this test will break in future if we implement and return ‘can_resize = true’ in the file plugin? Or will the (also updated) truncate filter be able to cope? Anyway since we can adjust or remove the test if necessary later, ACK RIch. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2019-Apr-29 13:08 UTC
Re: [Libguestfs] [nbdkit PATCH 4/4] doc: More details on (lack of) dynamic sizing
On Sat, Apr 27, 2019 at 04:26:46PM -0500, Eric Blake wrote:> We already documented in the lifetime section that .get_size is in the > same boat as .can_write and friends, where the values reported may > change between connections, but should generally remain constant for a > given connection and that other parts of the code may cache rather > than constantly re-check the callback. However, it is worth a bit > more detail directly on the .get_size callback, in particular > mentioning that a server that allows externally growing images is > generally safe. > > When the NBD protocol extension for dynamic resize is finally > implemented, we'll probably have to add more rules on how and when > size can change. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > docs/nbdkit-plugin.pod | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod > index e9dc34f..15d571f 100644 > --- a/docs/nbdkit-plugin.pod > +++ b/docs/nbdkit-plugin.pod > @@ -479,6 +479,15 @@ check for errors, but obviously this is outside the scope of nbdkit. > This is called during the option negotiation phase of the protocol > to get the size (in bytes) of the block device being exported. > > +The NBD protocol does not currently provide a way for a client to > +learn about or request any dynamic size changes. Thus, although this > +callback may return different values across different C<.open> calls, > +other code in nbdkit may cache the size reported during open, rather > +than checking for the current size. It is generally safe to report a > +larger size than before (even if the client will not access beyond the > +original size), but risky to shrink an image below the size in use by > +any current connection.I'm in two minds about documenting things that (a) might change in future and (b) seem a bit dangerous -- is it really safe to report larger sizes? So personally I'd either leave this undocumented, or if you want to write something then say that at the moment the plugin should always return the same size, but future changes to the NBD protocol might implement resizing with the cooperation of the plugin. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Seemingly Similar Threads
- [nbdkit PATCH 0/4] More mutex sanity checking
- [PATCH v2 nbdkit 0/6] Add truncate filter and other fixes.
- [nbdkit PATCH] offset, partition: Fix .extents with non-zero offset
- [PATCH nbdkit v5 FINAL 00/19] Implement extents.
- [nbdkit PATCH 0/4] Start using cleanup macros in filters/plugins