Eric Blake
2018-Feb-01 03:26 UTC
[Libguestfs] [nbdkit PATCH v2 0/3] add log, blocksize filters
Since v1: add the blocksize filter, add testsuite coverage of the log filter, several fixes to the log filter based on what adding tests revealed I'm still working on FUA flag support patches on top of this; the patches should all be committed in the same release, as we want to minimize the number of releases that cause a filter ABI/API bump Eric Blake (3): backend: Rework internal/filter error return semantics filters: Add log filter filters: Add blocksize filter TODO | 7 - docs/nbdkit-filter.pod | 85 +++++- docs/nbdkit.pod | 2 + filters/blocksize/nbdkit-blocksize-filter.pod | 141 ++++++++++ filters/log/nbdkit-log-filter.pod | 115 ++++++++ configure.ac | 4 +- src/internal.h | 1 - src/connections.c | 45 +-- src/filters.c | 81 ++++-- src/plugins.c | 66 +++-- filters/blocksize/blocksize.c | 379 ++++++++++++++++++++++++++ filters/cache/cache.c | 49 ++-- filters/cow/cow.c | 28 +- filters/log/log.c | 362 ++++++++++++++++++++++++ filters/partition/partition.c | 2 +- filters/Makefile.am | 2 + filters/blocksize/Makefile.am | 62 +++++ filters/log/Makefile.am | 62 +++++ tests/Makefile.am | 8 + tests/test-blocksize.sh | 152 +++++++++++ tests/test-log.sh | 88 ++++++ 21 files changed, 1599 insertions(+), 142 deletions(-) create mode 100644 filters/blocksize/nbdkit-blocksize-filter.pod create mode 100644 filters/log/nbdkit-log-filter.pod create mode 100644 filters/blocksize/blocksize.c create mode 100644 filters/log/log.c create mode 100644 filters/blocksize/Makefile.am create mode 100644 filters/log/Makefile.am create mode 100755 tests/test-blocksize.sh create mode 100755 tests/test-log.sh -- 2.14.3
Eric Blake
2018-Feb-01 03:26 UTC
[Libguestfs] [nbdkit PATCH v2 1/3] backend: Rework internal/filter error return semantics
Previously, we let a plugin set an error in either thread-local storage (nbdkit_set_error()) or errno, then connections.c would decode which error to use. But with filters in the mix, it is very difficult for a filter to know what error was set by the plugin (particularly since nbdkit_set_error() has no public counterpart for reading the thread-local storage). What's more, if a filter does any non-trivial processing after calling into next_ops, it is very probable that errno might be corrupted, which would affect the error returned by a plugin that relied on errno but not the error stored in thread-local storage. Better is to change the backend interface to just pass the direct error value, by moving the decoding of thread-local vs. errno into plugins.c. With the change in decoding location, the backend interface no longer needs an .errno_is_preserved() callback. For maximum convenience, this change lets a filter return an error either by passing through the underlying plugin return (a positive error) or by setting -1 and storing something in errno. However, I did have to tweak some of the existing filters to actually handle and/or return the right error; which means this IS a subtle change to the filter interface (and worse, one that cannot be detected by the compiler because the API signatures did not change). However, more ABI changes are planned with adding FUA support, so as long as it is all done as part of the same release, we are okay. Also note that only nbdkit-plugin.h declares nbdkit_set_error(); we can actually keep it this way (filters do not need to call that function). Signed-off-by: Eric Blake <eblake at redhat.com> --- docs/nbdkit-filter.pod | 83 +++++++++++++++++++++++++++++++++++++------ src/internal.h | 1 - src/connections.c | 45 ++++++----------------- src/filters.c | 81 +++++++++++++++++++++++++---------------- src/plugins.c | 66 +++++++++++++++++++--------------- filters/cache/cache.c | 49 +++++++++++++++---------- filters/cow/cow.c | 28 +++++++++------ filters/partition/partition.c | 2 +- 8 files changed, 221 insertions(+), 134 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index eb72dae..4ddf25d 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -163,10 +163,14 @@ short-circuited. The filter?s other methods like C<.prepare>, C<.get_size>, C<.pread> etc ? always called in the context of a connection ? are passed a -pointer to C<struct nbdkit_next_ops> which contains a subset of the -plugin methods that can be called during a connection. It is possible -for a filter to issue (for example) extra read calls in response to a -single C<.pwrite> call. +pointer to C<struct nbdkit_next_ops> which contains a comparable set +of accessors to plugin methods that can be called during a connection. +It is possible for a filter to issue (for example) extra read calls in +response to a single C<.pwrite> call. Note that the semantics of the +functions in C<struct nbdkit_next_ops> are slightly different from +what a plugin implements: for example, while a plugin's C<.pread> +returns -1 on error, C<next_ops->pread> returns a positive errno +value. You can modify parameters when you call the C<next> function. However be careful when modifying strings because for some methods @@ -324,6 +328,11 @@ will see. The returned size must be E<ge> 0. If there is an error, C<.get_size> should call C<nbdkit_error> with an error message and return C<-1>. +If this function is called more than once for the same connection, it +should return the same value; similarly, the filter may cache +C<next_ops->get_size> for a given connection rather than repeating +calls. Note that if C<next_ops->get_size> fails, the value of +C<errno> is indeterminate. =head2 C<.can_write> @@ -346,7 +355,11 @@ should call C<nbdkit_error> with an error message and return C<-1>. These intercept the corresponding plugin methods. If there is an error, the callback should call C<nbdkit_error> with an -error message and return C<-1>. +error message and return C<-1>. If these functions are called more +than once for the same connection, they should return the same value; +similarly, the filter may cache the results of each counterpart in +C<next_ops> for a given connection rather than repeating calls. Note +that if C<next_ops> fails, the value of C<errno> is indeterminate. =head2 C<.pread> @@ -356,9 +369,15 @@ error message and return C<-1>. This intercepts the plugin C<.pread> method and can be used to read or modify data read by the plugin. +Note that unlike the plugin C<.pread> which returns -1 on error, +C<next_ops->pread> returns a positive errno value on error; the filter +should use this return value to learn why the plugin failed, and not +rely on C<errno>. + If there is an error (including a short read which couldn't be recovered from), C<.pread> should call C<nbdkit_error> with an error -message B<and> set C<errno>, then return C<-1>. +message B<and> either return -1 with C<errno> set or return a positive +errno value corresponding to the problem. =head2 C<.pwrite> @@ -369,9 +388,15 @@ message B<and> set C<errno>, then return C<-1>. This intercepts the plugin C<.pwrite> method and can be used to modify data written by the plugin. +Note that unlike the plugin C<.pwrite> which returns -1 on error, +C<next_ops->pwrite> returns a positive errno value on error; the +filter should use this return value to learn why the plugin failed, +and not rely on C<errno>. + If there is an error (including a short write which couldn't be recovered from), C<.pwrite> should call C<nbdkit_error> with an error -message B<and> set C<errno>, then return C<-1>. +message B<and> either return -1 with C<errno> set or return a positive +errno value corresponding to the problem. =head2 C<.flush> @@ -381,8 +406,14 @@ message B<and> set C<errno>, then return C<-1>. This intercepts the plugin C<.flush> method and can be used to modify flush requests. +Note that unlike the plugin C<.flush> which returns -1 on error, +C<next_ops->flush> returns a positive errno value on error; the filter +should use this return value to learn why the plugin failed, and not +rely on C<errno>. + If there is an error, C<.flush> should call C<nbdkit_error> with an -error message B<and> set C<errno>, then return C<-1>. +error message B<and> either return -1 with C<errno> set or return a +positive errno value corresponding to the problem. =head2 C<.trim> @@ -392,8 +423,14 @@ error message B<and> set C<errno>, then return C<-1>. This intercepts the plugin C<.trim> method and can be used to modify trim requests. +Note that unlike the plugin C<.trim> which returns -1 on error, +C<next_ops->trim> returns a positive errno value on error; the filter +should use this return value to learn why the plugin failed, and not +rely on C<errno>. + If there is an error, C<.trim> should call C<nbdkit_error> with an -error message B<and> set C<errno>, then return C<-1>. +error message B<and> either return -1 with C<errno> set or return a +positive errno value corresponding to the problem. =head2 C<.zero> @@ -403,8 +440,34 @@ error message B<and> set C<errno>, then return C<-1>. This intercepts the plugin C<.zero> method and can be used to modify zero requests. +Note that unlike the plugin C<.zero> which returns -1 on error, +C<next_ops->zero> returns a positive errno value on error; the filter +should use this return value to learn why the plugin failed, and not +rely on C<errno>. Furthermore, C<next_ops->zero> will never return +C<ENOTSUP>; the plugin will have already fallen back to using +C<.pwrite> instead. + If there is an error, C<.zero> should call C<nbdkit_error> with an -error message B<and> set C<errno>, then return C<-1>. +error message B<and> either return -1 with C<errno> set or return a +positive errno value corresponding to the problem. The filter should +never fail with C<ENOTSUP> (while plugins have automatic fallback to +C<.pwrite>, filters do not). + +=head1 ERROR HANDLING + +If there is an error in the filter itself, the filter should call +C<nbdkit_error> to report an error message. If the callback is +involved in serving data, the return value determines the error code +that will be sent to the client; other callbacks should return the +appropriate error indication, eg. C<NULL> or C<-1>. + +C<nbdkit_error> has the following prototype and works like +L<printf(3)>: + + void nbdkit_error (const char *fs, ...); + void nbdkit_verror (const char *fs, va_list args); + +For convenience, C<nbdkit_error> preserves the value of C<errno>. =head1 DEBUGGING diff --git a/src/internal.h b/src/internal.h index 3cbfde5..2cda390 100644 --- a/src/internal.h +++ b/src/internal.h @@ -176,7 +176,6 @@ struct backend { void (*dump_fields) (struct backend *); void (*config) (struct backend *, const char *key, const char *value); void (*config_complete) (struct backend *); - int (*errno_is_preserved) (struct backend *); int (*open) (struct backend *, struct connection *conn, int readonly); int (*prepare) (struct backend *, struct connection *conn); int (*finalize) (struct backend *, struct connection *conn); diff --git a/src/connections.c b/src/connections.c index 75c2c2d..2959493 100644 --- a/src/connections.c +++ b/src/connections.c @@ -43,6 +43,7 @@ #include <endian.h> #include <sys/types.h> #include <stddef.h> +#include <assert.h> #include <pthread.h> @@ -912,18 +913,6 @@ validate_request (struct connection *conn, return true; /* Command validates. */ } -/* Grab the appropriate error value. - */ -static int -get_error (struct connection *conn) -{ - int ret = threadlocal_get_error (); - - if (!ret && backend->errno_is_preserved (backend)) - ret = errno; - return ret ? ret : EIO; -} - /* This is called with the request lock held to actually execute the * request (by calling the plugin). Note that the request fields have * been validated already in 'validate_request' so we don't have to @@ -942,49 +931,36 @@ handle_request (struct connection *conn, uint32_t f = 0; bool fua = conn->can_flush && (flags & NBD_CMD_FLAG_FUA); - /* The plugin should call nbdkit_set_error() to request a particular - error, otherwise we fallback to errno or EIO. */ + /* Clear the error, so that we know if the plugin calls + nbdkit_set_error() or relied on errno. */ threadlocal_set_error (0); switch (cmd) { case NBD_CMD_READ: - if (backend->pread (backend, conn, buf, count, offset, 0) == -1) - return get_error (conn); - break; + return backend->pread (backend, conn, buf, count, offset, 0); case NBD_CMD_WRITE: if (fua) f |= NBDKIT_FLAG_FUA; - if (backend->pwrite (backend, conn, buf, count, offset, f) == -1) - return get_error (conn); - break; + return backend->pwrite (backend, conn, buf, count, offset, f); case NBD_CMD_FLUSH: - if (backend->flush (backend, conn, 0) == -1) - return get_error (conn); - break; + return backend->flush (backend, conn, 0); case NBD_CMD_TRIM: if (fua) f |= NBDKIT_FLAG_FUA; - if (backend->trim (backend, conn, count, offset, f) == -1) - return get_error (conn); - break; + return backend->trim (backend, conn, count, offset, f); case NBD_CMD_WRITE_ZEROES: if (!(flags & NBD_CMD_FLAG_NO_HOLE)) f |= NBDKIT_FLAG_MAY_TRIM; if (fua) f |= NBDKIT_FLAG_FUA; - if (backend->zero (backend, conn, count, offset, f) == -1) - return get_error (conn); - break; - - default: - abort (); + return backend->zero (backend, conn, count, offset, f); } - - return 0; + /* Unreachable */ + abort (); } static int @@ -1130,6 +1106,7 @@ recv_request_send_reply (struct connection *conn) else { lock_request (conn); error = handle_request (conn, cmd, flags, offset, count, buf); + assert ((int) error >= 0); unlock_request (conn); } diff --git a/src/filters.c b/src/filters.c index 40c4913..1003dc7 100644 --- a/src/filters.c +++ b/src/filters.c @@ -107,14 +107,6 @@ filter_thread_model (struct backend *b) /* These are actually passing through to the final plugin, hence * the function names. */ -static int -plugin_errno_is_preserved (struct backend *b) -{ - struct backend_filter *f = container_of (b, struct backend_filter, backend); - - return f->backend.next->errno_is_preserved (f->backend.next); -} - static const char * plugin_name (struct backend *b) { @@ -463,17 +455,23 @@ filter_pread (struct backend *b, struct connection *conn, struct backend_filter *f = container_of (b, struct backend_filter, backend); void *handle = connection_get_handle (conn, f->backend.i); struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + int r; assert (flags == 0); debug ("pread count=%" PRIu32 " offset=%" PRIu64, count, offset); - if (f->filter.pread) - return f->filter.pread (&next_ops, &nxdata, handle, - buf, count, offset); + if (f->filter.pread) { + r = f->filter.pread (&next_ops, &nxdata, handle, + buf, count, offset); + if (r < 0) + r = errno; + } else - return f->backend.next->pread (f->backend.next, conn, - buf, count, offset, flags); + r = f->backend.next->pread (f->backend.next, conn, + buf, count, offset, flags); + assert (r >= 0); + return r; } static int @@ -485,18 +483,24 @@ filter_pwrite (struct backend *b, struct connection *conn, void *handle = connection_get_handle (conn, f->backend.i); struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; bool fua = flags & NBDKIT_FLAG_FUA; + int r; assert (!(flags & ~NBDKIT_FLAG_FUA)); debug ("pwrite count=%" PRIu32 " offset=%" PRIu64 " fua=%d", count, offset, fua); - if (f->filter.pwrite) - return f->filter.pwrite (&next_ops, &nxdata, handle, - buf, count, offset); + if (f->filter.pwrite) { + r = f->filter.pwrite (&next_ops, &nxdata, handle, + buf, count, offset); + if (r < 0) + r = errno; + } else - return f->backend.next->pwrite (f->backend.next, conn, - buf, count, offset, flags); + r = f->backend.next->pwrite (f->backend.next, conn, + buf, count, offset, flags); + assert (r >= 0); + return r; } static int @@ -505,15 +509,21 @@ filter_flush (struct backend *b, struct connection *conn, uint32_t flags) struct backend_filter *f = container_of (b, struct backend_filter, backend); void *handle = connection_get_handle (conn, f->backend.i); struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + int r; assert (flags == 0); debug ("flush"); - if (f->filter.flush) - return f->filter.flush (&next_ops, &nxdata, handle); + if (f->filter.flush) { + r= f->filter.flush (&next_ops, &nxdata, handle); + if (r < 0) + r = errno; + } else - return f->backend.next->flush (f->backend.next, conn, flags); + r = f->backend.next->flush (f->backend.next, conn, flags); + assert (r >= 0); + return r; } static int @@ -524,15 +534,21 @@ filter_trim (struct backend *b, struct connection *conn, struct backend_filter *f = container_of (b, struct backend_filter, backend); void *handle = connection_get_handle (conn, f->backend.i); struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + int r; assert (flags == 0); debug ("trim count=%" PRIu32 " offset=%" PRIu64, count, offset); - if (f->filter.trim) - return f->filter.trim (&next_ops, &nxdata, handle, count, offset); + if (f->filter.trim) { + r = f->filter.trim (&next_ops, &nxdata, handle, count, offset); + if (r < 0) + r = errno; + } else - return f->backend.next->trim (f->backend.next, conn, count, offset, flags); + r = f->backend.next->trim (f->backend.next, conn, count, offset, flags); + assert (r >= 0); + return r; } static int @@ -543,18 +559,24 @@ filter_zero (struct backend *b, struct connection *conn, void *handle = connection_get_handle (conn, f->backend.i); struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; int may_trim = (flags & NBDKIT_FLAG_MAY_TRIM) != 0; + int r; assert (!(flags & ~(NBDKIT_FLAG_MAY_TRIM | NBDKIT_FLAG_FUA))); debug ("zero count=%" PRIu32 " offset=%" PRIu64 " may_trim=%d", count, offset, may_trim); - if (f->filter.zero) - return f->filter.zero (&next_ops, &nxdata, handle, - count, offset, may_trim); + if (f->filter.zero) { + r = f->filter.zero (&next_ops, &nxdata, handle, + count, offset, may_trim); + if (r < 0) + r = errno; + } else - return f->backend.next->zero (f->backend.next, conn, - count, offset, flags); + r = f->backend.next->zero (f->backend.next, conn, + count, offset, flags); + assert (r >= 0 && r != ENOTSUP); + return r; } static struct backend filter_functions = { @@ -567,7 +589,6 @@ static struct backend filter_functions = { .dump_fields = filter_dump_fields, .config = filter_config, .config_complete = filter_config_complete, - .errno_is_preserved = plugin_errno_is_preserved, .open = filter_open, .prepare = filter_prepare, .finalize = filter_finalize, diff --git a/src/plugins.c b/src/plugins.c index dba3e24..c49c0f0 100644 --- a/src/plugins.c +++ b/src/plugins.c @@ -227,14 +227,6 @@ plugin_config_complete (struct backend *b) exit (EXIT_FAILURE); } -static int -plugin_errno_is_preserved (struct backend *b) -{ - struct backend_plugin *p = container_of (b, struct backend_plugin, backend); - - return p->plugin.errno_is_preserved; -} - static int plugin_open (struct backend *b, struct connection *conn, int readonly) { @@ -358,11 +350,24 @@ plugin_can_trim (struct backend *b, struct connection *conn) return p->plugin.trim != NULL; } +/* Grab the appropriate error value. + */ +static int +get_error (struct backend_plugin *p) +{ + int ret = threadlocal_get_error (); + + if (!ret && p->plugin.errno_is_preserved) + ret = errno; + return ret ? ret : EIO; +} + static int plugin_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, uint32_t flags) { struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + int r; assert (connection_get_handle (conn, 0)); assert (p->plugin.pread != NULL); @@ -370,25 +375,29 @@ plugin_pread (struct backend *b, struct connection *conn, debug ("pread count=%" PRIu32 " offset=%" PRIu64, count, offset); - return p->plugin.pread (connection_get_handle (conn, 0), buf, count, offset); + r = p->plugin.pread (connection_get_handle (conn, 0), buf, count, offset); + if (r < 0) + r = get_error (p); + return r; } static int plugin_flush (struct backend *b, struct connection *conn, uint32_t flags) { struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + int r; assert (connection_get_handle (conn, 0)); assert (!flags); debug ("flush"); - if (p->plugin.flush != NULL) - return p->plugin.flush (connection_get_handle (conn, 0)); - else { - errno = EINVAL; - return -1; - } + if (!p->plugin.flush) + return EINVAL; + r = p->plugin.flush (connection_get_handle (conn, 0)); + if (r < 0) + r = get_error (p); + return r; } static int @@ -408,14 +417,14 @@ plugin_pwrite (struct backend *b, struct connection *conn, if (p->plugin.pwrite != NULL) r = p->plugin.pwrite (connection_get_handle (conn, 0), buf, count, offset); - else { - errno = EROFS; - return -1; - } + else + return EROFS; if (r == 0 && fua) { assert (p->plugin.flush); r = plugin_flush (b, conn, 0); } + if (r < 0) + r = get_error (p); return r; } @@ -435,14 +444,14 @@ plugin_trim (struct backend *b, struct connection *conn, if (p->plugin.trim != NULL) r = p->plugin.trim (connection_get_handle (conn, 0), count, offset); - else { - errno = EINVAL; - return -1; - } + else + return EINVAL; if (r == 0 && fua) { assert (p->plugin.flush); r = plugin_flush (b, conn, 0); } + if (r < 0) + r = get_error (p); return r; } @@ -472,7 +481,7 @@ plugin_zero (struct backend *b, struct connection *conn, count, offset, may_trim); if (result == -1) { err = threadlocal_get_error (); - if (!err && plugin_errno_is_preserved (b)) + if (!err && p->plugin.errno_is_preserved) err = errno; } if (result == 0 || err != EOPNOTSUPP) @@ -483,10 +492,8 @@ plugin_zero (struct backend *b, struct connection *conn, threadlocal_set_error (0); limit = count < MAX_REQUEST_SIZE ? count : MAX_REQUEST_SIZE; buf = calloc (limit, 1); - if (!buf) { - errno = ENOMEM; - return -1; - } + if (!buf) + return ENOMEM; while (count) { result = p->plugin.pwrite (connection_get_handle (conn, 0), @@ -507,6 +514,8 @@ plugin_zero (struct backend *b, struct connection *conn, assert (p->plugin.flush); result = plugin_flush (b, conn, 0); } + if (result < 0) + result = get_error (p); return result; } @@ -520,7 +529,6 @@ static struct backend plugin_functions = { .dump_fields = plugin_dump_fields, .config = plugin_config, .config_complete = plugin_config_complete, - .errno_is_preserved = plugin_errno_is_preserved, .open = plugin_open, .prepare = plugin_prepare, .finalize = plugin_finalize, diff --git a/filters/cache/cache.c b/filters/cache/cache.c index 9473f2c..2ae6f01 100644 --- a/filters/cache/cache.c +++ b/filters/cache/cache.c @@ -292,6 +292,7 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, const uint8_t *block) { off_t offset = blknum * BLKSIZE; + int r; nbdkit_debug ("cache: blk_writethrough block %" PRIu64 " (offset %" PRIu64 ")", @@ -302,8 +303,9 @@ blk_writethrough (struct nbdkit_next_ops *next_ops, void *nxdata, return -1; } - if (next_ops->pwrite (nxdata, block, BLKSIZE, offset) == -1) - return -1; + r = next_ops->pwrite (nxdata, block, BLKSIZE, offset); + if (r) + return r; blk_set_bitmap_entry (blknum, BLOCK_CLEAN); @@ -341,6 +343,7 @@ cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, uint32_t count, uint64_t offset) { uint8_t *block; + int r; block = malloc (BLKSIZE); if (block == NULL) { @@ -357,9 +360,10 @@ cache_pread (struct nbdkit_next_ops *next_ops, void *nxdata, if (n > count) n = count; - if (blk_read (next_ops, nxdata, blknum, block) == -1) { + r = blk_read (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } memcpy (buf, &block[blkoffs], n); @@ -379,6 +383,7 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, const void *buf, uint32_t count, uint64_t offset) { uint8_t *block; + int r; block = malloc (BLKSIZE); if (block == NULL) { @@ -396,14 +401,16 @@ cache_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, n = count; /* Do a read-modify-write operation on the current block. */ - if (blk_read (next_ops, nxdata, blknum, block) == -1) { + r = blk_read (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } memcpy (&block[blkoffs], buf, n); - if (blk_writeback (next_ops, nxdata, blknum, block) == -1) { + r = blk_writeback (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } buf += n; @@ -421,6 +428,7 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, int may_trim) { uint8_t *block; + int r; block = malloc (BLKSIZE); if (block == NULL) { @@ -437,14 +445,16 @@ cache_zero (struct nbdkit_next_ops *next_ops, void *nxdata, if (n > count) n = count; - if (blk_read (next_ops, nxdata, blknum, block) == -1) { + r = blk_read (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } memset (&block[blkoffs], 0, n); - if (blk_writeback (next_ops, nxdata, blknum, block) == -1) { + r = blk_writeback (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } count -= n; @@ -463,7 +473,7 @@ cache_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) uint64_t i, j; uint64_t blknum; enum bm_entry state; - unsigned errors = 0; + int error = 0, r; if (cache_mode == CACHE_MODE_UNSAFE) return 0; @@ -494,10 +504,10 @@ cache_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) /* Perform a read + writethrough which will read from the * cache and write it through to the underlying storage. */ - if (blk_read (next_ops, nxdata, blknum, block) == -1 || - blk_writethrough (next_ops, nxdata, blknum, block)) { + if ((r = blk_read (next_ops, nxdata, blknum, block)) || + (r = blk_writethrough (next_ops, nxdata, blknum, block))) { nbdkit_error ("cache: flush of block %" PRIu64 " failed", blknum); - errors++; + error = error ? error : r; } } } @@ -507,10 +517,11 @@ cache_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) free (block); /* Now issue a flush request to the underlying storage. */ - if (next_ops->flush (nxdata) == -1) - errors++; + r = next_ops->flush (nxdata); + if (r) + error = error ? error : r; - return errors == 0 ? 0 : -1; + return error; } static struct nbdkit_filter filter = { diff --git a/filters/cow/cow.c b/filters/cow/cow.c index 5c2fcd0..18debf7 100644 --- a/filters/cow/cow.c +++ b/filters/cow/cow.c @@ -313,6 +313,7 @@ cow_pread (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, uint32_t count, uint64_t offset) { uint8_t *block; + int r; block = malloc (BLKSIZE); if (block == NULL) { @@ -329,9 +330,10 @@ cow_pread (struct nbdkit_next_ops *next_ops, void *nxdata, if (n > count) n = count; - if (blk_read (next_ops, nxdata, blknum, block) == -1) { + r = blk_read (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } memcpy (buf, &block[blkoffs], n); @@ -351,6 +353,7 @@ cow_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, const void *buf, uint32_t count, uint64_t offset) { uint8_t *block; + int r; block = malloc (BLKSIZE); if (block == NULL) { @@ -368,14 +371,16 @@ cow_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, n = count; /* Do a read-modify-write operation on the current block. */ - if (blk_read (next_ops, nxdata, blknum, block) == -1) { + r = blk_read (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } memcpy (&block[blkoffs], buf, n); - if (blk_write (blknum, block) == -1) { + r = blk_write (blknum, block); + if (r) { free (block); - return -1; + return r; } buf += n; @@ -393,6 +398,7 @@ cow_zero (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, int may_trim) { uint8_t *block; + int r; block = malloc (BLKSIZE); if (block == NULL) { @@ -412,14 +418,16 @@ cow_zero (struct nbdkit_next_ops *next_ops, void *nxdata, /* XXX There is the possibility of optimizing this: ONLY if we are * writing a whole, aligned block, then use FALLOC_FL_ZERO_RANGE. */ - if (blk_read (next_ops, nxdata, blknum, block) == -1) { + r = blk_read (next_ops, nxdata, blknum, block); + if (r) { free (block); - return -1; + return r; } memset (&block[blkoffs], 0, n); - if (blk_write (blknum, block) == -1) { + r = blk_write (blknum, block); + if (r) { free (block); - return -1; + return r; } count -= n; diff --git a/filters/partition/partition.c b/filters/partition/partition.c index f74b3af..bf5238d 100644 --- a/filters/partition/partition.c +++ b/filters/partition/partition.c @@ -216,7 +216,7 @@ find_gpt_partition (struct nbdkit_next_ops *next_ops, void *nxdata, * partition_prepare call above. */ if (next_ops->pread (nxdata, partition_bytes, sizeof partition_bytes, - 2*512 + i*128) == -1) + 2*512 + i*128)) return -1; get_gpt_partition (partition_bytes, &partition); if (memcmp (partition.partition_type_guid, -- 2.14.3
Eric Blake
2018-Feb-01 03:26 UTC
[Libguestfs] [nbdkit PATCH v2 2/3] filters: Add log filter
'nbdkit -v' is quite verbose, and traces everything. Sometimes, we want to trace JUST the client interactions. In particular, when debugging another filter, being able to trace what the filter called can be quite useful; and having the log filter in place will make it easier to add testsuite coverage of other filters (this patch includes a rudimentary test of just the log filter). Also, it is nice to have timestamps in the log, in order to see if message interleaving takes place, and what delays are occurring. The filter requires a logfile; I debated making the filter fall back to logging to stderr if no logfile parameter was found, but while that works for 'nbdkit -f', it fails miserably when we daemonize (since stderr is redirected to /dev/null in that setup). I've marked places where the logging could improve once we update the filter API to more fully reflect the backend API, as part of improving FUA support. Signed-off-by: Eric Blake <eblake at redhat.com> --- TODO | 2 - docs/nbdkit-filter.pod | 1 + docs/nbdkit.pod | 1 + filters/log/nbdkit-log-filter.pod | 115 ++++++++++++ configure.ac | 1 + filters/log/log.c | 362 ++++++++++++++++++++++++++++++++++++++ filters/Makefile.am | 1 + filters/log/Makefile.am | 62 +++++++ tests/Makefile.am | 4 + tests/test-log.sh | 88 +++++++++ 10 files changed, 635 insertions(+), 2 deletions(-) create mode 100644 filters/log/nbdkit-log-filter.pod create mode 100644 filters/log/log.c create mode 100644 filters/log/Makefile.am create mode 100755 tests/test-log.sh diff --git a/TODO b/TODO index 4731b1e..a691ff3 100644 --- a/TODO +++ b/TODO @@ -77,8 +77,6 @@ Suggestions for filters * injecting artificial errors or otherwise masking plugin features (such as hiding zero support) for testing clients -* logging all client commands - * fua filter: setting mode=none stops advertisement, mode=emulate uses flush emulation (or fails if !can_flush), mode=native passes on through (or fails if can_fua does not report native support). When diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 4ddf25d..a2babef 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -541,6 +541,7 @@ Filters: L<nbdkit-cache-filter(1)>, L<nbdkit-cow-filter(1)>, L<nbdkit-delay-filter(1)>, +L<nbdkit-log-filter(1)>, L<nbdkit-offset-filter(1)>, L<nbdkit-partition-filter(1)>. diff --git a/docs/nbdkit.pod b/docs/nbdkit.pod index 1167245..22d91e7 100644 --- a/docs/nbdkit.pod +++ b/docs/nbdkit.pod @@ -920,6 +920,7 @@ Filters: L<nbdkit-cache-filter(1)>, L<nbdkit-cow-filter(1)>, L<nbdkit-delay-filter(1)>, +L<nbdkit-log-filter(1)>, L<nbdkit-offset-filter(1)>, L<nbdkit-partition-filter(1)>. diff --git a/filters/log/nbdkit-log-filter.pod b/filters/log/nbdkit-log-filter.pod new file mode 100644 index 0000000..f3484fe --- /dev/null +++ b/filters/log/nbdkit-log-filter.pod @@ -0,0 +1,115 @@ +=encoding utf8 + +=head1 NAME + +nbdkit-log-filter - nbdkit log filter + +=head1 SYNOPSIS + + nbdkit --filter=log plugin logfile=FILE [plugin-args...] + +=head1 DESCRIPTION + +C<nbdkit-log-filter> is a filter that logs all transactions. When +used as the first filter, it can show the original client requests; as +a later filter, it can show how earlier filters have modified the +original request. The log results are placed in a user-specified +file; for more details on the log format, see L<FILES>. Note that +using C<nbdkit -v> produces much more verbose logging details to +stderr about every aspect of nbdkit operation. + +=head1 PARAMETERS + +The nbdkit-log-filter requires a single parameter C<logfile> which +specifies the path of the file to use for logging. If the file +already exists, it will be truncated. + +=head1 EXAMPLES + +Serve the file F<disk.img>, and log each client transaction in the +file F<disk.log>: + + nbdkit --filter=log file file=disk.img logfile=disk.log + +Repeat the task, but with the cow (copy-on-write) filter to perform +local caching of data served from the original plugin: + + nbdkit --filter=cow --filter=log file file=disk.img logfile=disk.log2 + +After running a client that performs the same operations under each of +the two servers, you can compare F<disk.log> and F<disk.log2> to see +the impact of the caching. + +=head1 FILES + +This filter writes to the file specified by the C<logfile=FILE> +parameter. All lines include a timestamp, a connection counter, then +details about the command. The following actions are logged: Connect, +Read, Write, Zero, Trim, Flush, and Disconnect. Except for Connect +and Disconnect, an event is logged across two lines for call and +return value, to allow tracking duration and tracing any parallel +execution, using id for correlation (incremented per action on the +connection). + +An example logging session of a client that performs a single +successful read is: + + 2018-01-27 20:38:22.959984 connection=1 Connect size=0x400 write=1 flush=1 rotational=0 trim=0 + 2018-01-27 20:38:23.001720 connection=1 Read id=1 offset=0x0 count=0x100 ... + 2018-01-27 20:38:23.001995 connection=1 ...Read id=1 return=0 (Success) + 2018-01-27 20:38:23.044259 connection=1 Disconnect transactions=1 + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-file-plugin(1)>, +L<nbdkit-cow-filter(1)>, +L<nbdkit-filter(3)>. + +=head1 AUTHORS + +Eric Blake + +=head1 COPYRIGHT + +Copyright (C) 2018 Red Hat Inc. + +=head1 LICENSE + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +=over 4 + +=item * + +Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +=item * + +Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +=item * + +Neither the name of Red Hat nor the names of its contributors may be +used to endorse or promote products derived from this software without +specific prior written permission. + +=back + +THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE. diff --git a/configure.ac b/configure.ac index 6025ce0..3fcc776 100644 --- a/configure.ac +++ b/configure.ac @@ -516,6 +516,7 @@ AC_CONFIG_FILES([Makefile filters/cache/Makefile filters/cow/Makefile filters/delay/Makefile + filters/log/Makefile filters/offset/Makefile filters/partition/Makefile src/Makefile diff --git a/filters/log/log.c b/filters/log/log.c new file mode 100644 index 0000000..90920b5 --- /dev/null +++ b/filters/log/log.c @@ -0,0 +1,362 @@ +/* nbdkit + * Copyright (C) 2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <stdarg.h> +#include <errno.h> +#include <inttypes.h> +#include <pthread.h> +#include <sys/time.h> +#include <assert.h> + +#include <nbdkit-filter.h> + +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL + +static uint64_t connections; +static char *logfilename; +static FILE *logfile; +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; + +static void +log_unload (void) +{ + if (logfilename) + fclose (logfile); + free (logfilename); +} + +/* Called for each key=value passed on the command line. */ +static int +log_config (nbdkit_next_config *next, void *nxdata, + const char *key, const char *value) +{ + if (strcmp (key, "logfile") == 0) { + free (logfilename); + logfilename = strdup (value); + if (!logfilename) { + nbdkit_error ("strdup: %m"); + return -1; + } + return 0; + } + return next (nxdata, key, value); +} + +/* Open the logfile. */ +static int +log_config_complete (nbdkit_next_config_complete *next, void *nxdata) +{ + if (!logfilename) { + nbdkit_error ("missing logfile= parameter for the log filter"); + return -1; + } + logfile = fopen (logfilename, "w"); + if (!logfile) { + nbdkit_error ("fopen: %m"); + return -1; + } + + return next (nxdata); +} + +#define log_config_help \ + "logfile=<FILE> The file to place the log in." + +struct handle { + uint64_t connection; + uint64_t id; +}; + +/* Compute the next id number on the current connection. */ +static uint64_t +get_id (struct handle *h) +{ + uint64_t r; + + pthread_mutex_lock (&lock); + r = ++h->id; + pthread_mutex_unlock (&lock); + return r; +} + +/* Output a timestamp and the log message. */ +static void __attribute__ ((format (printf, 4, 5))) +output (struct handle *h, const char *act, uint64_t id, const char *fmt, ...) +{ + va_list args; + struct timeval tv; + struct tm tm; + char timestamp[27] = "Time unknown"; + + /* Logging is best effort, so ignore failure to get timestamp */ + if (!gettimeofday (&tv, NULL)) + { + size_t s; + + gmtime_r (&tv.tv_sec, &tm); + s = strftime (timestamp, sizeof timestamp, "%F %T", &tm); + assert (s); + s = snprintf (timestamp + s, sizeof timestamp - s, ".%06ld", + 0L + tv.tv_usec); + } + flockfile (logfile); + fprintf (logfile, "%s connection=%" PRIu64 " %s ", timestamp, h->connection, + act); + if (id) + fprintf (logfile, "id=%" PRIu64 " ", id); + va_start (args, fmt); + vfprintf (logfile, fmt, args); + va_end (args); + fputc ('\n', logfile); + fflush (logfile); + funlockfile (logfile); +} + +/* Shared code for a nicer log of return value */ +static void +output_return (struct handle *h, const char *act, uint64_t id, int r) +{ + const char *s = "Other=>EINVAL"; + + /* Only decode what connections.c:nbd_errno() recognizes */ + switch (r) { + case 0: + s = "Success"; + break; + case EROFS: + s = "EROFS=>EPERM"; + break; + case EPERM: + s = "EPERM"; + break; + case EIO: + s = "EIO"; + break; + case ENOMEM: + s = "ENOMEM"; + break; +#ifdef EDQUOT + case EDQUOT: + s = "EDQUOT=>ENOSPC"; + break; +#endif + case EFBIG: + s = "EFBIG=>ENOSPC"; + break; + case ENOSPC: + s = "ENOSPC"; + break; +#ifdef ESHUTDOWN + case ESHUTDOWN: + s = "ESHUTDOWN"; + break; +#endif + case EINVAL: + s = "EINVAL"; + break; + } + output (h, act, id, "return=%d (%s)", r, s); +} + +/* Open a connection. */ +static void * +log_open (nbdkit_next_open *next, void *nxdata, int readonly) +{ + struct handle *h; + + if (next (nxdata, readonly) == -1) + return NULL; + + h = malloc (sizeof *h); + if (h == NULL) { + nbdkit_error ("malloc: %m"); + return NULL; + } + + pthread_mutex_lock (&lock); + h->connection = ++connections; + pthread_mutex_unlock (&lock); + h->id = 0; + return h; +} + +static void +log_close (void *handle) +{ + struct handle *h = handle; + + free (h); +} + +static int +log_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct handle *h = handle; + int64_t size = next_ops->get_size (nxdata); + int w = next_ops->can_write (nxdata); + int f = next_ops->can_flush (nxdata); + int r = next_ops->is_rotational (nxdata); + int t = next_ops->can_trim (nxdata); + + if (size < 0 || w < 0 || f < 0 || r < 0 || t < 0) + return -1; + + /* TODO expose can_zero, can_fua to filters */ + output (h, "Connect", 0, "size=0x%" PRIx64 " write=%d flush=%d " + "rotational=%d trim=%d" /* zero=? fua=? */, size, w, f, r, t); + return 0; +} + +static int +log_finalize (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct handle *h = handle; + + output (h, "Disconnect", 0, "transactions=%" PRId64, h->id); + return 0; +} + +/* Read data. */ +static int +log_pread (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, void *buf, uint32_t count, uint64_t offs) +{ + struct handle *h = handle; + uint64_t id = get_id (h); + int r; + + /* TODO expose flags to filters */ + /* assert (!flags); */ + output (h, "Read", id, "offset=0x%" PRIx64 " count=0x%x ...", + offs, count); + r = next_ops->pread (nxdata, buf, count, offs); + output_return (h, "...Read", id, r); + return r; +} + +/* Write data. */ +static int +log_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, const void *buf, uint32_t count, uint64_t offs) +{ + struct handle *h = handle; + uint64_t id = get_id (h); + int r; + + /* TODO expose flags to filters */ + /* assert (!(flags & ~fua)) */ + output (h, "Write", id, "offset=0x%" PRIx64 " count=0x%x fua=? ...", + offs, count); + r = next_ops->pwrite (nxdata, buf, count, offs); + output_return (h, "...Write", id, r); + return r; +} + +/* Flush. */ +static int +log_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) +{ + struct handle *h = handle; + uint64_t id = get_id (h); + int r; + + /* TODO expose flags to filters */ + /* assert (!flags) */ + output (h, "Flush", id, "..."); + r = next_ops->flush (nxdata); + output_return (h, "...Flush", id, r); + return r; +} + +/* Trim data. */ +static int +log_trim (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs) +{ + struct handle *h = handle; + uint64_t id = get_id (h); + int r; + + /* TODO expose flags to filters */ + /* assert (!(flags & ~fua)) */ + output (h, "Trim", id, "offset=0x%" PRIx64 " count=0x%x fua=? ...", + offs, count); + r = next_ops->trim (nxdata, count, offs); + output_return (h, "...Trim", id, r); + return r; +} + +/* Zero data. */ +static int +log_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, int may_trim) +{ + struct handle *h = handle; + uint64_t id = get_id (h); + int r; + + /* TODO expose FUA flag to filters */ + /* assert (!(flags & ~(fua | trim))) */ + output (h, "Zero", id, "offset=0x%" PRIx64 " count=0x%x trim=%d fua=? ...", + offs, count, may_trim); + r = next_ops->zero (nxdata, count, offs, may_trim); + output_return (h, "...Zero", id, r); + return r; +} + +static struct nbdkit_filter filter = { + .name = "log", + .longname = "nbdkit log filter", + .version = PACKAGE_VERSION, + .config = log_config, + .config_complete = log_config_complete, + .config_help = log_config_help, + .unload = log_unload, + .open = log_open, + .close = log_close, + .prepare = log_prepare, + .finalize = log_finalize, + .pread = log_pread, + .pwrite = log_pwrite, + .flush = log_flush, + .trim = log_trim, + .zero = log_zero, +}; + +NBDKIT_REGISTER_FILTER(filter) diff --git a/filters/Makefile.am b/filters/Makefile.am index 9996d77..8e070e5 100644 --- a/filters/Makefile.am +++ b/filters/Makefile.am @@ -34,5 +34,6 @@ SUBDIRS = \ cache \ cow \ delay \ + log \ offset \ partition diff --git a/filters/log/Makefile.am b/filters/log/Makefile.am new file mode 100644 index 0000000..18a0eba --- /dev/null +++ b/filters/log/Makefile.am @@ -0,0 +1,62 @@ +# nbdkit +# Copyright (C) 2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +EXTRA_DIST = nbdkit-log-filter.pod + +CLEANFILES = *~ + +filterdir = $(libdir)/nbdkit/filters + +filter_LTLIBRARIES = nbdkit-log-filter.la + +nbdkit_log_filter_la_SOURCES = \ + log.c \ + $(top_srcdir)/include/nbdkit-filter.h + +nbdkit_log_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include +nbdkit_log_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) +nbdkit_log_filter_la_LDFLAGS = \ + -module -avoid-version -shared + +if HAVE_POD2MAN + +man_MANS = nbdkit-log-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-log-filter.1: nbdkit-log-filter.pod + $(POD2MAN) $(POD2MAN_ARGS) --section=1 --name=`basename $@ .1` $< $@.t && \ + if grep 'POD ERROR' $@.t; then rm $@.t; exit 1; fi && \ + mv $@.t $@ + +endif diff --git a/tests/Makefile.am b/tests/Makefile.am index 1e32cb6..0d06410 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -43,6 +43,7 @@ EXTRA_DIST = \ test-cache.sh \ test-captive.sh \ test-cow.sh \ + test-log.sh \ test-cxx.sh \ test-dump-config.sh \ test-dump-plugin.sh \ @@ -427,6 +428,9 @@ test_delay_SOURCES = test-delay.c test.h test_delay_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) test_delay_LDADD = libtest.la $(LIBGUESTFS_LIBS) +# log filter test. +TESTS += test-log.sh + # offset filter test. check_DATA += offset-data MAINTAINERCLEANFILES += offset-data diff --git a/tests/test-log.sh b/tests/test-log.sh new file mode 100755 index 0000000..1fa03b6 --- /dev/null +++ b/tests/test-log.sh @@ -0,0 +1,88 @@ +#!/bin/bash - +# nbdkit +# Copyright (C) 2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +set -e + +files="log.img log.log log.sock log.pid" +rm -f $files + +: ${QEMU_IO=qemu-io} + +# Test that qemu-io works +truncate --size 10M log.img +if ! $QEMU_IO -f raw -c 'w 1M 2M' log.img; then + echo "$0: missing or broken qemu-io" + exit 77 +fi + +# Run nbdkit with logging enabled to file. +nbdkit -P log.pid -U log.sock --filter=log file file=log.img logfile=log.log + +# We may have to wait a short time for the pid file to appear. +for i in `seq 1 10`; do + if test -f log.pid; then + break + fi + sleep 1 +done +if ! test -f log.pid; then + echo "$0: PID file was not created" + exit 1 +fi + +pid="$(cat log.pid)" + +# Kill the nbdkit process on exit. +cleanup () +{ + status=$? + + kill $pid + # For easier debugging, dump the final log file before removing it. + echo "Log file contents:" + cat log.log + rm -f $files + + exit $status +} +trap cleanup INT QUIT TERM EXIT ERR + +# Write, then read some data in the file. +$QEMU_IO -f raw -c 'w -P 11 1M 2M' 'nbd+unix://?socket=log.sock' +$QEMU_IO -r -f raw -c 'r -P 11 2M 1M' 'nbd+unix://?socket=log.sock' + +# The log should show a write on connection 1, and read on connection 2. +grep 'connection=1 Write id=1 offset=0x100000 count=0x200000' log.log +grep 'connection=2 Read id=1 offset=0x200000 count=0x100000' log.log + +# The cleanup() function is called implicitly on exit. -- 2.14.3
Eric Blake
2018-Feb-01 03:26 UTC
[Libguestfs] [nbdkit PATCH v2 3/3] filters: Add blocksize filter
The upstream NBD protocol recently clarified that servers can advertise block size limitations to clients that ask with NBD_OPT_GO (although we're still a ways off from implementing that in nbdkit); and that in the absence of that, then clients should agree on limits using out-of-band information or stick to sane defaults (everything 512-byte-aligned, no reads or writes larger than 32M). But the protocol is not inherently prevented from serving 1-byte requests, nor does it prohibit a request of nearly 4G on bulk actions like trim and zero; and nbdkit itself supports 1-byte requests, read and write up to 64M, and no limit on trim or zero, even though these requests may fail miserably on some plugins. So, nbdkit should make it easier to plug together clients and servers that have different notions of blocksize limitations. A new blocksize filter makes it possible to specify a minimum blocksize handed to the plugin (everything smaller is rounded out, using read-modify-write as needed), and maximum limits (maxdata for read/write since they have a buffer, and maxlen for zero/trim since they do not). Testing is easy by reusing the log filter and observing that requests were rewritten as expected. Signed-off-by: Eric Blake <eblake at redhat.com> --- TODO | 5 - docs/nbdkit-filter.pod | 1 + docs/nbdkit.pod | 1 + filters/blocksize/nbdkit-blocksize-filter.pod | 141 ++++++++++ configure.ac | 3 +- filters/blocksize/blocksize.c | 379 ++++++++++++++++++++++++++ filters/Makefile.am | 1 + filters/blocksize/Makefile.am | 62 +++++ tests/Makefile.am | 4 + tests/test-blocksize.sh | 152 +++++++++++ 10 files changed, 743 insertions(+), 6 deletions(-) create mode 100644 filters/blocksize/nbdkit-blocksize-filter.pod create mode 100644 filters/blocksize/blocksize.c create mode 100644 filters/blocksize/Makefile.am create mode 100755 tests/test-blocksize.sh diff --git a/TODO b/TODO index a691ff3..d02671d 100644 --- a/TODO +++ b/TODO @@ -86,11 +86,6 @@ Suggestions for filters unneeded intermediate flushing; hence, where this filter is placed in the stack may have a performance impact. -* blocksize filter: setting minblock performs read-modify-write of - requests that are too small or unaligned for the plugin; setting - maxdata breaks up too-large read/write; setting maxlen breaks up - too-large trim/zero - nbdkit-cache-filter needs considerable work: * allow the user to limit the total size of the cache diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index a2babef..4fe0789 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -538,6 +538,7 @@ L<nbdkit-plugin(1)>. Filters: +L<nbdkit-blocksize-filter(1)>, L<nbdkit-cache-filter(1)>, L<nbdkit-cow-filter(1)>, L<nbdkit-delay-filter(1)>, diff --git a/docs/nbdkit.pod b/docs/nbdkit.pod index 22d91e7..94ddb62 100644 --- a/docs/nbdkit.pod +++ b/docs/nbdkit.pod @@ -917,6 +917,7 @@ L<nbdkit-xz-plugin(1)>. Filters: +L<nbdkit-blocksize-filter(1)>, L<nbdkit-cache-filter(1)>, L<nbdkit-cow-filter(1)>, L<nbdkit-delay-filter(1)>, diff --git a/filters/blocksize/nbdkit-blocksize-filter.pod b/filters/blocksize/nbdkit-blocksize-filter.pod new file mode 100644 index 0000000..1433187 --- /dev/null +++ b/filters/blocksize/nbdkit-blocksize-filter.pod @@ -0,0 +1,141 @@ +=encoding utf8 + +=head1 NAME + +nbdkit-blocksize-filter - nbdkit blocksize filter + +=head1 SYNOPSIS + + nbdkit --filter=blocksize plugin [minblock=SIZE] [maxdata=SIZE] \ + [maxlen=SIZE] [plugin-args...] + +=head1 DESCRIPTION + +C<nbdkit-blocksize-filter> is a filter that ensures various block size +limits are met on transactions presented to the plugin. The NBD +protocol permits clients to send requests with a granularity as small +as 1 byte or as large as nearly 4 gigabytes, although it suggests that +portable clients should align requests to 512 bytes and not exceed 32 +megabytes without prior coordination with the server. + +Meanwhile, some plugins require requests to be aligned to 512-byte +multiples, or may enforce a maximum transaction size to bound the time +or memory resources spent by any one command (note that nbdkit itself +refuses a read or write larger than 64 megabytes, while many other NBD +servers limit things to 32 megabytes). The blocksize filter can be +used to modify the client requests to meet the plugin restrictions. + +=head1 PARAMETERS + +The nbdkit-blocksize-filter accepts the following parameters. + +=over 4 + +=item B<minblock=SIZE> + +The minimum block size and alignment to pass to the plugin. This must +be a power of two, and no larger than 64k. If omitted, this defaults +to 1 (that is, no minimum size restrictions). The filter rounds up +read requests to alignment boundaries, performs read-modify-write +cycles for any unaligned head or tail of a write or zero request, and +silently ignores any unaligned head or tail of a trim request. The +filter also truncates the plugin size down to an aligned value (as it +cannot safely operate on the unaligned tail); it is an error if this +would result in a size of 0. + +This parameter understands the suffix 'k' for 1024. + +=item B<maxdata=SIZE> + +The maximum block size for any single transaction with data (read and +write). If omitted, this defaults to 64 megabytes (that is, the +nbdkit maximum). This need not be a power of two, but must be an +integer multiple of C<minblock>. The filter fragments any larger +client request into multiple plugin requests. + +This parameter understands the suffixes 'k', 'M', and 'G' for powers +of 1024. + +=item B<maxlen=SIZE> + +The maximum length for any single transaction without data (trim and +zero). If omitted, this defaults to 0xffffffff rounded down to +C<minsize> (that is, the inherent 32-bit limit of the NBD protocol). +This need not be a power of two, but must be an integer multiple of +C<minblock>, and should be at least as large as C<maxdata>. The +filter fragments any larger client request into multiple plugin +requests. + +This parameter understands the suffixes 'k', 'M', and 'G' for powers +of 1024. + +=back + +=head1 EXAMPLES + +Allow an arbitrary client to use the VDDK plugin (which is limited to +512-byte blocks), without having to fix the client to avoid sending +unaligned requests: + + nbdkit --filter=blocksize vddk minblock=512 file=/absolute/path/to/file.vmdk + +Allow an arbitrary client tuned to nbdkit's 64 megabyte sizing to +connect to a remote server that insists on 32 megabyte sizing, via the +nbd plugin: + + nbdkit --filter=blocksize nbd maxdata=32M socket=/path/to/socket + +=head1 SEE ALSO + +L<nbdkit(1)>, +L<nbdkit-nbd-plugin(1)>, +L<nbdkit-vddk-plugin(1)>, +L<nbdkit-filter(3)>. + +=head1 AUTHORS + +Eric Blake + +=head1 COPYRIGHT + +Copyright (C) 2018 Red Hat Inc. + +=head1 LICENSE + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +=over 4 + +=item * + +Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +=item * + +Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +=item * + +Neither the name of Red Hat nor the names of its contributors may be +used to endorse or promote products derived from this software without +specific prior written permission. + +=back + +THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +SUCH DAMAGE. diff --git a/configure.ac b/configure.ac index 3fcc776..c3a121d 100644 --- a/configure.ac +++ b/configure.ac @@ -1,5 +1,5 @@ # nbdkit -# Copyright (C) 2013-2017 Red Hat Inc. +# Copyright (C) 2013-2018 Red Hat Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -513,6 +513,7 @@ AC_CONFIG_FILES([Makefile plugins/vddk/Makefile plugins/xz/Makefile filters/Makefile + filters/blocksize/Makefile filters/cache/Makefile filters/cow/Makefile filters/delay/Makefile diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c new file mode 100644 index 0000000..27dce8e --- /dev/null +++ b/filters/blocksize/blocksize.c @@ -0,0 +1,379 @@ +/* nbdkit + * Copyright (C) 2018 Red Hat Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <string.h> +#include <inttypes.h> +#include <limits.h> +#include <errno.h> + +#include <nbdkit-filter.h> + +/* XXX See design comment in filters/cow/cow.c. */ +#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS + +#define BLOCKSIZE_MIN_LIMIT (64U * 1024) +#define MIN(a, b) ((a) < (b) ? (a) : (b)) + +/* As long as we don't have parallel requests, we can reuse a common + * buffer for alignment purposes; size it to the maximum we allow for + * minblock. */ +static char bounce[BLOCKSIZE_MIN_LIMIT]; +static unsigned int minblock; +static unsigned int maxdata; +static unsigned int maxlen; + +static int +blocksize_parse (const char *name, const char *s, unsigned int *v) +{ + unsigned long l; + char *end; + unsigned int scale = 1; + + errno = 0; + l = strtol (s, &end, 0); + if (errno || s == end) { + nbdkit_error ("unable to parse '%s' for parameter '%s'", s, name); + return -1; + } + if (!l) { + nbdkit_error ("parameter '%s' must not be 0", name); + return -1; + } + switch (*end) { + case 'g': + case 'G': + scale *= 1024; + /* fallthru */ + case 'm': + case 'M': + scale *= 1024; + /* fallthru */ + case 'k': + case 'K': + scale *= 1024; + end++; + } + if (*end) { + nbdkit_error ("unable to parse '%s' for parameter '%s'", s, name); + return -1; + } + if (UINT_MAX / scale < l) { + nbdkit_error ("parameter '%s' too large", name); + return -1; + } + *v = l * scale; + return 0; +} + +/* Called for each key=value passed on the command line. */ +static int +blocksize_config (nbdkit_next_config *next, void *nxdata, + const char *key, const char *value) +{ + + if (strcmp (key, "minblock") == 0) + return blocksize_parse (key, value, &minblock); + if (strcmp (key, "maxdata") == 0) + return blocksize_parse (key, value, &maxdata); + if (strcmp (key, "maxlen") == 0) + return blocksize_parse (key, value, &maxlen); + return next (nxdata, key, value); +} + +/* Check that limits are sane. */ +static int +blocksize_config_complete (nbdkit_next_config_complete *next, void *nxdata) +{ + if (minblock) { + if (minblock & (minblock - 1)) { + nbdkit_error ("minblock must be a power of 2"); + return -1; + } + if (minblock > BLOCKSIZE_MIN_LIMIT) { + nbdkit_error ("minblock must not exceed %u", BLOCKSIZE_MIN_LIMIT); + return -1; + } + } + else + minblock = 1; + + if (maxdata) { + if (maxdata & (minblock - 1)) { + nbdkit_error ("maxdata must be a multiple of %u", minblock); + return -1; + } + } + else + maxdata = 64 * 1024 * 1024; + + if (maxlen) { + if (maxlen & (minblock - 1)) { + nbdkit_error ("maxlen must be a multiple of %u", minblock); + return -1; + } + } + else + maxlen = -minblock; + + return next (nxdata); +} + +#define blocksize_config_help \ + "minblock=<SIZE> Minimum block size, power of 2 <= 64k (default 1).\n" \ + "maxdata=<SIZE> Maximum size for read/write (default 64M).\n" \ + "maxlen=<SIZE> Maximum size for trim/zero (default 4G-minblock)." + +static int +blocksize_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* Early call to get_size to ensure it doesn't truncate to 0. */ + int64_t size = next_ops->get_size (nxdata); + + if (size == -1) + return -1; + if (size < minblock) { + nbdkit_error ("disk is too small for minblock size %u", minblock); + return -1; + } + return 0; +} + +static int64_t +blocksize_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + int64_t size = next_ops->get_size (nxdata); + + return size == -1 ? size : size & ~(minblock - 1); +} + +static int +blocksize_pread (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, void *b, uint32_t count, uint64_t offs) +{ + char *buf = b; + int r; + uint32_t keep; + uint32_t drop; + + /* Unaligned head */ + if (offs & (minblock - 1)) { + drop = offs & (minblock - 1); + keep = MIN (minblock - drop, count); + r = next_ops->pread (nxdata, bounce, minblock, offs - drop); + if (r) + return r; + memcpy (buf, bounce + drop, keep); + buf += keep; + offs += keep; + count -= keep; + } + + /* Unaligned tail */ + if (count & (minblock - 1)) { + keep = count & (minblock - 1); + count -= keep; + r = next_ops->pread (nxdata, bounce, minblock, offs + count); + if (r) + return r; + memcpy (buf + count, bounce, keep); + } + + /* Aligned body */ + while (count) { + keep = MIN (maxdata, count); + r = next_ops->pread (nxdata, buf, keep, offs); + if (r) + return r; + buf += keep; + offs += keep; + count -= keep; + } + + return 0; +} + +static int +blocksize_pwrite (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, const void *b, uint32_t count, uint64_t offs) +{ + const char *buf = b; + int r; + uint32_t keep; + uint32_t drop; + + /* Unaligned head */ + if (offs & (minblock - 1)) { + drop = offs & (minblock - 1); + keep = MIN (minblock - drop, count); + r = next_ops->pread (nxdata, bounce, minblock, offs - drop); + if (r) + return r; + memcpy (bounce + drop, buf, keep); + r = next_ops->pwrite (nxdata, bounce, minblock, offs - drop); + if (r) + return r; + buf += keep; + offs += keep; + count -= keep; + } + + /* Unaligned tail */ + if (count & (minblock - 1)) { + keep = count & (minblock - 1); + count -= keep; + r = next_ops->pread (nxdata, bounce, minblock, offs + count); + if (r) + return r; + memcpy (bounce, buf + count, keep); + r = next_ops->pwrite (nxdata, bounce, minblock, offs + count); + if (r) + return r; + } + + /* Aligned body */ + while (count) { + keep = MIN (maxdata, count); + r = next_ops->pwrite (nxdata, buf, keep, offs); + if (r) + return r; + buf += keep; + offs += keep; + count -= keep; + } + + return 0; +} + +static int +blocksize_trim (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs) +{ + int r; + uint32_t keep; + + /* Unaligned head */ + if (offs & (minblock - 1)) { + keep = MIN (minblock - (offs & (minblock - 1)), count); + offs += keep; + count -= keep; + } + + /* Unaligned tail */ + if (count & (minblock - 1)) + count -= count & (minblock - 1); + + /* Aligned body */ + while (count) { + keep = MIN (maxlen, count); + r = next_ops->trim (nxdata, keep, offs); + if (r) + return r; + offs += keep; + count -= keep; + } + + return 0; +} + +static int +blocksize_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, int may_trim) +{ + int r; + uint32_t keep; + uint32_t drop; + + /* Unaligned head */ + if (offs & (minblock - 1)) { + drop = offs & (minblock - 1); + keep = MIN (minblock - drop, count); + r = next_ops->pread (nxdata, bounce, minblock, offs - drop); + if (r) + return r; + memset (bounce + drop, 0, keep); + r = next_ops->pwrite (nxdata, bounce, minblock, offs - drop); + if (r) + return r; + offs += keep; + count -= keep; + } + + /* Unaligned tail */ + if (count & (minblock - 1)) { + keep = count & (minblock - 1); + count -= keep; + r = next_ops->pread (nxdata, bounce, minblock, offs + count); + if (r) + return r; + memset (bounce, 0, keep); + r = next_ops->pwrite (nxdata, bounce, minblock, offs + count); + if (r) + return r; + } + + /* Aligned body */ + while (count) { + keep = MIN (maxlen, count); + r = next_ops->zero (nxdata, keep, offs, may_trim); + if (r) + return r; + offs += keep; + count -= keep; + } + + return 0; +} + +static struct nbdkit_filter filter = { + .name = "blocksize", + .longname = "nbdkit blocksize filter", + .version = PACKAGE_VERSION, + .config = blocksize_config, + .config_complete = blocksize_config_complete, + .config_help = blocksize_config_help, + .prepare = blocksize_prepare, + .get_size = blocksize_get_size, + .pread = blocksize_pread, + .pwrite = blocksize_pwrite, + .trim = blocksize_trim, + .zero = blocksize_zero, +}; + +NBDKIT_REGISTER_FILTER(filter) diff --git a/filters/Makefile.am b/filters/Makefile.am index 8e070e5..de98f43 100644 --- a/filters/Makefile.am +++ b/filters/Makefile.am @@ -31,6 +31,7 @@ # SUCH DAMAGE. SUBDIRS = \ + blocksize \ cache \ cow \ delay \ diff --git a/filters/blocksize/Makefile.am b/filters/blocksize/Makefile.am new file mode 100644 index 0000000..0069403 --- /dev/null +++ b/filters/blocksize/Makefile.am @@ -0,0 +1,62 @@ +# nbdkit +# Copyright (C) 2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +EXTRA_DIST = nbdkit-blocksize-filter.pod + +CLEANFILES = *~ + +filterdir = $(libdir)/nbdkit/filters + +filter_LTLIBRARIES = nbdkit-blocksize-filter.la + +nbdkit_blocksize_filter_la_SOURCES = \ + blocksize.c \ + $(top_srcdir)/include/nbdkit-filter.h + +nbdkit_blocksize_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include +nbdkit_blocksize_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) +nbdkit_blocksize_filter_la_LDFLAGS = \ + -module -avoid-version -shared + +if HAVE_POD2MAN + +man_MANS = nbdkit-blocksize-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-blocksize-filter.1: nbdkit-blocksize-filter.pod + $(POD2MAN) $(POD2MAN_ARGS) --section=1 --name=`basename $@ .1` $< $@.t && \ + if grep 'POD ERROR' $@.t; then rm $@.t; exit 1; fi && \ + mv $@.t $@ + +endif diff --git a/tests/Makefile.am b/tests/Makefile.am index 0d06410..d214250 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -40,6 +40,7 @@ EXTRA_DIST = \ shebang.pl \ shebang.py \ shebang.rb \ + test-blocksize.sh \ test-cache.sh \ test-captive.sh \ test-cow.sh \ @@ -414,6 +415,9 @@ endif HAVE_RUBY #---------------------------------------------------------------------- # Tests of filters. +# blocksize filter test. +TESTS += test-blocksize.sh + # cache filter test. TESTS += test-cache.sh diff --git a/tests/test-blocksize.sh b/tests/test-blocksize.sh new file mode 100755 index 0000000..6217030 --- /dev/null +++ b/tests/test-blocksize.sh @@ -0,0 +1,152 @@ +#!/bin/bash - +# nbdkit +# Copyright (C) 2018 Red Hat Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +set -e + +files="blocksize1.img blocksize1.log blocksize1.sock blocksize1.pid + blocksize2.img blocksize2.log blocksize2.sock blocksize2.pid" +rm -f $files + +: ${QEMU_IO=qemu-io} + +# Prep images, and check that qemu-io understands the actions we plan on doing. +truncate --size 10M blocksize1.img +if ! $QEMU_IO -f raw -c 'r 0 1' -c 'w -z 1000 2000' \ + -c 'w -P 0 1M 2M' -c 'discard 3M 4M' blocksize1.img; then + echo "$0: missing or broken qemu-io" + rm blocksize1.img + exit 77 +fi +truncate --size 10M blocksize2.img + +pid1= pid2+ +# Kill any nbdkit processes on exit. +cleanup () +{ + status=$? + + test "$pid1" && kill $pid1 + test "$pid2" && kill $pid2 + # For easier debugging, dump the final log files before removing them. + echo "Log 1 file contents:" + cat blocksize1.log || : + echo "Log 2 file contents:" + cat blocksize2.log || : + rm -f $files + + exit $status +} +trap cleanup INT QUIT TERM EXIT ERR + +# Run two parallel nbdkit; to compare the logs and see what changes. +nbdkit -P blocksize1.pid -U blocksize1.sock \ + --filter=log file logfile=blocksize1.log file=blocksize1.img +nbdkit -P blocksize2.pid -U blocksize2.sock --filter=blocksize \ + --filter=log file logfile=blocksize2.log file=blocksize2.img \ + minblock=512 maxdata=512k maxlen=1M \ + +# We may have to wait a short time for the pid files to appear. +for i in `seq 1 10`; do + if test -f blocksize1.pid && test -f blocksize2.pid; then + break + fi + sleep 1 +done + +pid1="$(cat blocksize1.pid)" || : +pid2="$(cat blocksize2.pid)" || : + +if ! test -f blocksize1.pid || ! test -f blocksize2.pid; then + echo "$0: PID files were not created" + exit 1 +fi + +# Test behavior on short accesses. +$QEMU_IO -f raw -c 'r 1 1' -c 'w 10001 1' -c 'w -z 20001 1' \ + -c 'discard 30001 1' 'nbd+unix://?socket=blocksize1.sock' +$QEMU_IO -f raw -c 'r 1 1' -c 'w 10001 1' -c 'w -z 20001 1' \ + -c 'discard 30001 1' 'nbd+unix://?socket=blocksize2.sock' + +# Read should round up +grep 'connection=1 Read .* count=0x1' blocksize1.log || + { echo "qemu-io can't pass 1-byte reads"; exit 77; } +grep 'connection=1 Read .* offset=0x0 count=0x200' blocksize2.log +# Write should become read-modify-write +grep 'connection=1 Write .* count=0x1' blocksize1.log || + { echo "qemu-io can't pass 1-byte writes"; exit 77; } +grep 'connection=1 Read .* offset=0x2600' blocksize2.log +grep 'connection=1 Write .* offset=0x2600' blocksize2.log +# Zero should become read-modify-write +if grep 'connection=1 Zero' blocksize2.log; then + echo "filter should have converted short zero to write" + exit 1 +fi +grep 'connection=1 Read .* offset=0x4e00' blocksize2.log +grep 'connection=1 Write .* offset=0x4e00' blocksize2.log +# Trim should be discarded +if grep 'connection=1 Trim' blocksize2.log; then + echo "filter should have dropped too-small trim" + exit 1 +fi + +# Test behavior on overlarge accesses. +$QEMU_IO -f raw -c 'w -P 11 1048575 4094305' -c 'w -z 1050000 1100000' \ + -c 'r -P 0 1050000 1100000' -c 'r -P 11 3000000 1048577' \ + -c 'discard 7340031 2097153' 'nbd+unix://?socket=blocksize1.sock' +$QEMU_IO -f raw -c 'w -P 11 1048575 4094305' -c 'w -z 1050000 1100000' \ + -c 'r -P 0 1050000 1100000' -c 'r -P 11 3000000 1048577' \ + -c 'discard 7340031 2097153' 'nbd+unix://?socket=blocksize2.sock' + +# Reads and writes should have been split. +test "$(grep -c '\(Read\|Write\) .*count=0x80000 ' blocksize2.log)" = 10 +grep 'Zero .*count=0x100000 ' blocksize2.log +test "$(grep -c 'connection=2 Zero' blocksize2.log)" = 2 +if grep Trim blocksize1.log; then + test "$(grep -c 'connection=2 Trim .*count=0x100000 ' blocksize2.log)" = 2 +fi + +# Final sanity checks. +if grep 'offset=0x[0-9a-f]*\([13579bcf]00\|[0-9a-f]\(.[^0]\|[^0].\)\) ' \ + blocksize2.log; then + echo "filter didn't align offset to 512"; + exit 1; +fi +if grep 'count=0x[0-9a-f]*\([13579bcf]00\|[0-9a-f]\(.[^0]\|[^0].\)\) ' \ + blocksize2.log; then + echo"filter didn't align count to 512"; + exit 1; +fi +diff -u blocksize1.img blocksize2.img + +# The cleanup() function is called implicitly on exit. -- 2.14.3
Richard W.M. Jones
2018-Feb-01 14:50 UTC
Re: [Libguestfs] [nbdkit PATCH v2 1/3] backend: Rework internal/filter error return semantics
On Wed, Jan 31, 2018 at 09:26:37PM -0600, Eric Blake wrote:> Previously, we let a plugin set an error in either thread-local > storage (nbdkit_set_error()) or errno, then connections.c would > decode which error to use. But with filters in the mix, it is > very difficult for a filter to know what error was set by the > plugin (particularly since nbdkit_set_error() has no public > counterpart for reading the thread-local storage). What's more, > if a filter does any non-trivial processing after calling into > next_ops, it is very probable that errno might be corrupted, > which would affect the error returned by a plugin that relied > on errno but not the error stored in thread-local storage. > > Better is to change the backend interface to just pass the > direct error value, by moving the decoding of thread-local vs. > errno into plugins.c. With the change in decoding location, > the backend interface no longer needs an .errno_is_preserved() > callback. > > For maximum convenience, this change lets a filter return an > error either by passing through the underlying plugin return > (a positive error) or by setting -1 and storing something in > errno. However, I did have to tweak some of the existing > filters to actually handle and/or return the right error; which > means this IS a subtle change to the filter interface (and > worse, one that cannot be detected by the compiler because > the API signatures did not change). However, more ABI changes > are planned with adding FUA support, so as long as it is all > done as part of the same release, we are okay. > > Also note that only nbdkit-plugin.h declares nbdkit_set_error(); > we can actually keep it this way (filters do not need to call > that function). > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > docs/nbdkit-filter.pod | 83 +++++++++++++++++++++++++++++++++++++------ > src/internal.h | 1 - > src/connections.c | 45 ++++++----------------- > src/filters.c | 81 +++++++++++++++++++++++++---------------- > src/plugins.c | 66 +++++++++++++++++++--------------- > filters/cache/cache.c | 49 +++++++++++++++---------- > filters/cow/cow.c | 28 +++++++++------ > filters/partition/partition.c | 2 +- > 8 files changed, 221 insertions(+), 134 deletions(-) > > diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod > index eb72dae..4ddf25d 100644 > --- a/docs/nbdkit-filter.pod > +++ b/docs/nbdkit-filter.pod > @@ -163,10 +163,14 @@ short-circuited. > > The filter’s other methods like C<.prepare>, C<.get_size>, C<.pread> > etc ― always called in the context of a connection ― are passed a > -pointer to C<struct nbdkit_next_ops> which contains a subset of the > -plugin methods that can be called during a connection. It is possible > -for a filter to issue (for example) extra read calls in response to a > -single C<.pwrite> call. > +pointer to C<struct nbdkit_next_ops> which contains a comparable set > +of accessors to plugin methods that can be called during a connection. > +It is possible for a filter to issue (for example) extra read calls in > +response to a single C<.pwrite> call. Note that the semantics of the > +functions in C<struct nbdkit_next_ops> are slightly different from > +what a plugin implements: for example, while a plugin's C<.pread> > +returns -1 on error, C<next_ops->pread> returns a positive errnoI believe you should write this: C<next_ops-E<gt>pread> Similarly in the rest of the document. Looking a the patch as a whole, if I'm understanding it correctly, the functions like backend.pread will now always return 0 or positive errno? Or can they return -1 too? Would this patch have been simpler if we'd just added the nbdkit_get_errno() function to the public API (which is what I thought we were going to do). In that case the filters could check for the errno by doing: if (next_ops->pread (...) == -1) { /* If I need to know the errno, then ... */ int err = nbdkit_get_errno (); ... } Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Richard W.M. Jones
2018-Feb-01 14:53 UTC
Re: [Libguestfs] [nbdkit PATCH v2 3/3] filters: Add blocksize filter
On Wed, Jan 31, 2018 at 09:26:39PM -0600, Eric Blake wrote:> +static int > +blocksize_parse (const char *name, const char *s, unsigned int *v)Maybe use nbdkit_parse_size? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Eric Blake
2018-Feb-01 23:27 UTC
Re: [Libguestfs] [nbdkit PATCH v2 1/3] backend: Rework internal/filter error return semantics
On 01/31/2018 09:26 PM, Eric Blake wrote:> For maximum convenience, this change lets a filter return an > error either by passing through the underlying plugin return > (a positive error) or by setting -1 and storing something in > errno.For filters, this makes sense (all filters are in-tree, so we can audit that the return values follow conventions). But for plugins, we're better off being conservative:> +++ b/src/connections.c> @@ -942,49 +931,36 @@ handle_request (struct connection *conn, > uint32_t f = 0; > bool fua = conn->can_flush && (flags & NBD_CMD_FLAG_FUA); > > - /* The plugin should call nbdkit_set_error() to request a particular > - error, otherwise we fallback to errno or EIO. */ > + /* Clear the error, so that we know if the plugin calls > + nbdkit_set_error() or relied on errno. */ > threadlocal_set_error (0); > > switch (cmd) { > case NBD_CMD_READ: > - if (backend->pread (backend, conn, buf, count, offset, 0) == -1) > - return get_error (conn); > - break; > + return backend->pread (backend, conn, buf, count, offset, 0);...> - default: > - abort (); > + return backend->zero (backend, conn, count, offset, f); > } > - > - return 0; > + /* Unreachable */ > + abort (); > }Prior to the patch, only an explicit -1 value return from the plugin would trigger returning an error code to the client; all other values (whether -2, 0, or positive, even though the documentation only mentioned 0 as valid) would treat things as success on the reply to the client.> +++ b/src/plugins.c> static int > plugin_pread (struct backend *b, struct connection *conn, > void *buf, uint32_t count, uint64_t offset, uint32_t flags) > { > struct backend_plugin *p = container_of (b, struct backend_plugin, backend); > + int r; > > assert (connection_get_handle (conn, 0)); > assert (p->plugin.pread != NULL); > @@ -370,25 +375,29 @@ plugin_pread (struct backend *b, struct connection *conn, > > debug ("pread count=%" PRIu32 " offset=%" PRIu64, count, offset); > > - return p->plugin.pread (connection_get_handle (conn, 0), buf, count, offset); > + r = p->plugin.pread (connection_get_handle (conn, 0), buf, count, offset); > + if (r < 0) > + r = get_error (p); > + return r;But in the new code, I'm blindly returning the plugin's value (even if positive), which may break an out-of-tree plugin that used to return positive on success in spite of the documentation. I'll fix this for v3 to be more careful. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org