Eric Blake
2018-Mar-22 22:35 UTC
[Libguestfs] [nbdkit PATCH] plugins: Add .can_zero callback
Originally, I thought that since the plugin always emulates .zero with a fallback to .pwrite, we didn't need to expose the backend's .can_zero to plugins. But there is another consideration, as shown at least in the nbd plugin: a plugin may implement a .zero callback that depends on support from a remote endpoint. If the remote endpoint doesn't support zeroes, then the plugin HAS to fail .zero with EOPNOTSUPP to get the automatic fallback to .pwrite; this is slightly wasteful to just telling nbdkit to not use the .zero callback when it won't work. At the same time, we still want to advertise zero support to the client, even if we won't be calling .zero, since handling NBD_CMD_WRITE_ZEROES allows for less network traffic; if it is ever truly necessary to avoid advertising to the guest, the nozero filter can accomplish that task. So, this patch exposes a .can_zero callback for plugins, which can usually be omitted, but which can be used to short- circuit calls to .zero, with the nbd plugin as the first user; while plugins.c continues to report true (well, the same result as .can_write) to the backend regardless of the plugin's answer, unless the plugin hits an error. Testing of the nbd changes (and thus of the general nbdkit handling of plugin .can_zero) is best done as part of the nozero filter: using the filter on the server side means the nbd plugin won't see a zero advertisement, and thus the client side does not call nbd's .zero, but still advertises zero to the client. Now that the set of feature callbacks is the same for plugins and filters, we can consolidate the documentation for filters and merely focus on the slight differences. This patch breaks ABI from the earlier exposure of .can_fua to plugins a few patches ago, but that is okay as there has not been a release in the meantime (the logical grouping was nicer this way). Had a release happened, .can_zero would have to be placed after .zero in struct nbdkit_plugin. Also, in plugins.c, rearrange get_error() to a more logical location. Signed-off-by: Eric Blake <eblake@redhat.com> --- This sort of fell out from the v2v rhv-upload conversation on the list, although Rich's v7 patch there demonstrates that even when oVirt lacks zero support, the rhv-upload plugin still wants .zero to be called (because it can optimize zeros done beyond the highest write), so that particular plugin would not be able to take advantage of this change. At any rate, in writing it up, I noticed that the nbd plugin benefits from it, so it's still useful in that right. docs/nbdkit-filter.pod | 79 ++++++++++++++++++------------------------------- docs/nbdkit-plugin.pod | 22 +++++++++++++- include/nbdkit-plugin.h | 1 + src/plugins.c | 72 ++++++++++++++++++++++++++++---------------- plugins/nbd/nbd.c | 15 ++++++---- tests/test-nozero.sh | 34 +++++++++++++++++---- 6 files changed, 135 insertions(+), 88 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index e6bf0c3..9bdc690 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -343,6 +343,10 @@ calls. =head2 C<.can_trim> +=head2 C<.can_zero> + +=head2 C<.can_fua> + int (*can_write) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_flush) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -352,66 +356,39 @@ calls. void *handle); int (*can_trim) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); + int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); These intercept the corresponding plugin methods, and control feature bits advertised to the client. +Of note, the C<.can_zero> callback in the filter controls whether the +server advertises zero support to the client, which is slightly +different semantics than the plugin; that is, +C<next_ops-E<gt>can_zero> always returns true for a plugin, even when +the plugin's own C<.can_zero> callback returned false, because nbdkit +implements a fallback to C<.pwrite> at the plugin layer. + +Remember that most of the feature check functions return merely a +boolean success value, while C<.can_fua> has three success values. +The difference between values may affect choices made in the filter: +when splitting a write request that requested FUA from the client, if +C<next_ops-E<gt>can_fua> returns C<NBDKIT_FUA_NATIVE>, then the filter +should pass the FUA flag on to each sub-request; while if it is known +that FUA is emulated by a flush because of a return of +C<NBDKIT_FUA_EMULATE>, it is more efficient to only flush once after +all sub-requests have completed (often by passing C<NBDKIT_FLAG_FUA> +on to only the final sub-request, or by dropping the flag and ending +with a direct call to C<next_ops-E<gt>flush>). + If there is an error, the callback should call C<nbdkit_error> with an error message and return C<-1>. If these functions are called more than once for the same connection, they should return the same value; similarly, the filter may cache the results of each counterpart in C<next_ops> for a given connection rather than repeating calls. -=head2 C<.can_zero> - - int (*can_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, - void *handle); - -This controls whether write zero support will be advertised to the -client. This function has no counterpart in plugins, because nbdkit -can always emulate zero by using pwrite; but a filter may want to -force different handling than the nbdkit implementation. If this -callback is omitted, the default returned for the plugin layer is -true. - -If there is an error, the callback should call C<nbdkit_error> with an -error message and return C<-1>. Like the other initial queries -documented above, per-connection caching the of return value of this -function is allowed. - -=head2 C<.can_fua> - - int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, - void *handle); - -This controls how the Forced Unit Access flag will be handled; it is -only relevant for connections that are not read-only. This intercepts -the corresponding plugin method, to control feature bits advertised to -the client. Unlike other feature functions with just two success -states, this one returns three success values: C<NBDKIT_FUA_NONE> to -avoid advertising FUA support to the client, C<NBDKIT_FUA_EMULATE> if -FUA is emulated by nbdkit calling flush after a write transaction, and -C<NBDKIT_FUA_NATIVE> if FUA semantics are handled by the plugin -without the overhead of an extra flush from nbdkit. - -The difference between the two non-zero values may affect choices made -in the filter: when splitting a write request that requested FUA from -the client, native support should pass the FUA flag on to each -sub-request; while if it is known that FUA is emulated by a flush, it -is more efficient to only flush once after all sub-requests have -completed (often by passing C<NBDKIT_FLAG_FUA> on to only the final -sub-request, or by dropping the flag and ending with a direct call to -C<next_ops-E<gt>flush>). - -If this callback is omitted, the default returned for the plugin layer -is C<NBDKIT_FUA_EMULATE> if C<can_flush> returned true, otherwise it -is C<NBDKIT_FUA_NONE>. - -If there is an error, the callback should call C<nbdkit_error> with an -error message and return C<-1>. Like the other initial queries -documented above, per-connection caching of the return value of this -function is allowed. - =head2 C<.pread> int (*pread) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -505,7 +482,7 @@ value to return to the client. This intercepts the plugin C<.zero> method and can be used to modify zero requests. -This function will not be called if C<.can_write> returned false; in +This function will not be called if C<.can_zero> returned false; in turn, the filter should not call C<next_ops-E<gt>zero> if C<next_ops-E<gt>can_zero> did not return true. diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index d6d080a..6e89315 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -487,6 +487,25 @@ error message and return C<-1>. This callback is not required. If omitted, then we return true iff a C<.trim> callback has been defined. +=head2 C<.can_zero> + + int can_zero (void *handle); + +This is called during the option negotiation phase to find out if the +plugin wants the C<.zero> callback to be utilized. Support for +writing zeros is still advertised to the client (unless the nbdkit +filter nozero is also used), so returning false merely serves as a way +to avoid complicating the C<.zero> callback to have to fail with +C<EOPNOTSUPP> on the connections where it will never be more efficient +than using C<.pwrite> up front. + +If there is an error, C<.can_zero> should call C<nbdkit_error> with an +error message and return C<-1>. + +This callback is not required. If omitted, then nbdkit always tries +C<.zero> first if it is present, and gracefully falls back to +C<.pwrite> if C<.zero> was absent or failed with C<EOPNOTSUPP>. + =head2 C<.can_fua> int can_fua (void *handle); @@ -601,7 +620,7 @@ error message, and C<nbdkit_set_error> to record an appropriate error During the data serving phase, this callback is used to write C<count> bytes of zeroes at C<offset> in the backing store. -This function will not be called if C<.can_write> returned false. On +This function will not be called if C<.can_zero> returned false. On input, the parameter C<flags> may include C<NBDKIT_FLAG_MAY_TRIM> unconditionally, and C<NBDKIT_FLAG_FUA> based on the result of C<.can_fua>. @@ -826,6 +845,7 @@ and then users will be able to run it like this: L<nbdkit(1)>, L<nbdkit-filter(3)>, +L<nbdkit-nozero-filter(3)>, L<nbdkit-example1-plugin(1)>, L<nbdkit-example2-plugin(1)>, L<nbdkit-example3-plugin(1)>, diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index 3f541a1..65a2e0c 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -104,6 +104,7 @@ struct nbdkit_plugin { void (*dump_plugin) (void); + int (*can_zero) (void *handle); int (*can_fua) (void *handle); #if NBDKIT_API_VERSION == 1 int (*_unused1) (void *, void *, uint32_t, uint64_t); diff --git a/src/plugins.c b/src/plugins.c index d44e724..ff4facf 100644 --- a/src/plugins.c +++ b/src/plugins.c @@ -355,25 +355,24 @@ plugin_can_trim (struct backend *b, struct connection *conn) return p->plugin.trim || p->plugin._trim_old; } -/* Grab the appropriate error value. - */ -static int -get_error (struct backend_plugin *p) -{ - int ret = threadlocal_get_error (); - - if (!ret && p->plugin.errno_is_preserved) - ret = errno; - return ret ? ret : EIO; -} - static int plugin_can_zero (struct backend *b, struct connection *conn) { + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + assert (connection_get_handle (conn, 0)); + debug ("can_zero"); - /* We always allow .zero to fall back to .write, so plugins don't - * need to override this. */ + /* Note the special case here: the plugin's .can_zero controls only + * whether we call .zero; while the backend expects .can_zero to + * return whether to advertise zero support. Since we ALWAYS know + * how to fall back to .pwrite in plugin_zero(), we ignore the + * difference between the plugin's true or false return, and only + * call it to catch a -1 failure during negotiation. */ + if (p->plugin.can_zero && + p->plugin.can_zero (connection_get_handle (conn, 0)) == -1) + return -1; return plugin_can_write (b, conn); } @@ -401,6 +400,18 @@ plugin_can_fua (struct backend *b, struct connection *conn) return NBDKIT_FUA_EMULATE; } +/* Grab the appropriate error value. + */ +static int +get_error (struct backend_plugin *p) +{ + int ret = threadlocal_get_error (); + + if (!ret && p->plugin.errno_is_preserved) + ret = errno; + return ret ? ret : EIO; +} + static int plugin_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, uint32_t flags, @@ -534,6 +545,7 @@ plugin_zero (struct backend *b, struct connection *conn, bool fua = flags & NBDKIT_FLAG_FUA; bool emulate = false; bool need_flush = false; + int can_zero = 1; /* TODO cache this per-connection? */ assert (connection_get_handle (conn, 0)); assert (!(flags & ~(NBDKIT_FLAG_MAY_TRIM | NBDKIT_FLAG_FUA))); @@ -547,18 +559,26 @@ plugin_zero (struct backend *b, struct connection *conn, } if (!count) return 0; - errno = 0; - if (p->plugin.zero) - r = p->plugin.zero (connection_get_handle (conn, 0), count, offset, flags); - else if (p->plugin._zero_old) - r = p->plugin._zero_old (connection_get_handle (conn, 0), count, offset, - may_trim); - else - emulate = true; - if (r == -1) - *err = emulate ? EOPNOTSUPP : get_error (p); - if (r == 0 || *err != EOPNOTSUPP) - goto done; + if (p->plugin.can_zero) { + can_zero = p->plugin.can_zero (connection_get_handle (conn, 0)); + assert (can_zero != -1); + } + + if (can_zero) { + errno = 0; + if (p->plugin.zero) + r = p->plugin.zero (connection_get_handle (conn, 0), count, offset, + flags); + else if (p->plugin._zero_old) + r = p->plugin._zero_old (connection_get_handle (conn, 0), count, offset, + may_trim); + else + emulate = true; + if (r == -1) + *err = emulate ? EOPNOTSUPP : get_error (p); + if (r == 0 || *err != EOPNOTSUPP) + goto done; + } assert (p->plugin.pwrite || p->plugin._pwrite_old); flags &= ~NBDKIT_FLAG_MAY_TRIM; diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index dd84b04..51de178 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -618,6 +618,14 @@ nbd_can_trim (void *handle) return h->flags & NBD_FLAG_SEND_TRIM; } +static int +nbd_can_zero (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_WRITE_ZEROES; +} + static int nbd_can_fua (void *handle) { @@ -662,11 +670,7 @@ nbd_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) int f = 0; assert (!(flags & ~(NBDKIT_FLAG_FUA | NBDKIT_FLAG_MAY_TRIM))); - if (!(h->flags & NBD_FLAG_SEND_WRITE_ZEROES)) { - /* Trigger a fall back to regular writing */ - errno = EOPNOTSUPP; - return -1; - } + assert (h->flags & NBD_FLAG_SEND_WRITE_ZEROES); if (!(flags & NBDKIT_FLAG_MAY_TRIM)) f |= NBD_CMD_FLAG_NO_HOLE; @@ -716,6 +720,7 @@ static struct nbdkit_plugin plugin = { .can_flush = nbd_can_flush, .is_rotational = nbd_is_rotational, .can_trim = nbd_can_trim, + .can_zero = nbd_can_zero, .can_fua = nbd_can_fua, .pread = nbd_pread, .pwrite = nbd_pwrite, diff --git a/tests/test-nozero.sh b/tests/test-nozero.sh index 95ed045..57c4452 100755 --- a/tests/test-nozero.sh +++ b/tests/test-nozero.sh @@ -36,7 +36,9 @@ set -e files="nozero1.img nozero1.log nozero1.sock nozero1.pid nozero2.img nozero2.log nozero2.sock nozero2.pid nozero3.img nozero3.log nozero3.sock nozero3.pid - nozero4.img nozero4.log nozero4.sock nozero4.pid" + nozero4.img nozero4.log nozero4.sock nozero4.pid + nozero5.img nozero5a.log nozero5b.log nozero5a.sock nozero5b.sock + nozero5a.pid nozero5b.pid" rm -f $files # Prep images, and check that qemu-io understands the actions we plan on @@ -45,6 +47,7 @@ for f in {0..1023}; do printf '%1024s' . >> nozero1.img; done cp nozero1.img nozero2.img cp nozero1.img nozero3.img cp nozero1.img nozero4.img +cp nozero1.img nozero5.img if ! qemu-io -f raw -d unmap -c 'w -z -u 0 1M' nozero1.img; then echo "$0: missing or broken qemu-io" rm nozero?.img @@ -57,7 +60,7 @@ if test "$(stat -c %b nozero1.img)" = "$(stat -c %b nozero2.img)"; then fi cp nozero2.img nozero1.img -pid1= pid2= pid3= pid4+pid1= pid2= pid3= pid4= pid5a= pid5b # Kill any nbdkit processes on exit. cleanup () @@ -68,6 +71,8 @@ cleanup () test "$pid2" && kill $pid2 test "$pid3" && kill $pid3 test "$pid4" && kill $pid4 + test "$pid5a" && kill $pid5a + test "$pid5b" && kill $pid5b # For easier debugging, dump the final log files before removing them. echo "Log 1 file contents:" cat nozero1.log || : @@ -77,6 +82,10 @@ cleanup () cat nozero3.log || : echo "Log 4 file contents:" cat nozero4.log || : + echo "Log 5a file contents:" + cat nozero5a.log || : + echo "Log 5b file contents:" + cat nozero5b.log || : rm -f $files exit $status @@ -88,6 +97,8 @@ trap cleanup INT QUIT TERM EXIT ERR # 2: log before filter with zeromode=none (default), to ensure no ZERO request # 3: log before filter with zeromode=emulate, to ensure ZERO from client # 4: log after filter with zeromode=emulate, to ensure no ZERO to plugin +# 5a/b: both sides of nbd plugin: even though server side does not advertise +# ZERO, the client side still exposes it, and just skips calling nbd's .zero nbdkit -P nozero1.pid -U nozero1.sock --filter=log \ file logfile=nozero1.log file=nozero1.img nbdkit -P nozero2.pid -U nozero2.sock --filter=log --filter=nozero \ @@ -96,11 +107,15 @@ nbdkit -P nozero3.pid -U nozero3.sock --filter=log --filter=nozero \ file logfile=nozero3.log file=nozero3.img zeromode=emulate nbdkit -P nozero4.pid -U nozero4.sock --filter=nozero --filter=log \ file logfile=nozero4.log file=nozero4.img zeromode=emulate +nbdkit -P nozero5a.pid -U nozero5a.sock --filter=log --filter=nozero \ + file logfile=nozero5a.log file=nozero5.img +nbdkit -P nozero5b.pid -U nozero5b.sock --filter=log \ + nbd logfile=nozero5b.log socket=nozero5a.sock # We may have to wait a short time for the pid files to appear. for i in `seq 1 10`; do if test -f nozero1.pid && test -f nozero2.pid && test -f nozero3.pid && - test -f nozero4.pid; then + test -f nozero4.pid && test -f nozero5a.pid && test -f nozero5b.pid; then break fi sleep 1 @@ -110,9 +125,11 @@ pid1="$(cat nozero1.pid)" || : pid2="$(cat nozero2.pid)" || : pid3="$(cat nozero3.pid)" || : pid4="$(cat nozero4.pid)" || : +pid5a="$(cat nozero5a.pid)" || : +pid5b="$(cat nozero5b.pid)" || : if ! test -f nozero1.pid || ! test -f nozero2.pid || ! test -f nozero3.pid || - ! test -f nozero4.pid; then + ! test -f nozero4.pid || ! test -f nozero5a.pid || ! test -f nozero5b.pid; then echo "$0: PID files were not created" exit 1 fi @@ -122,6 +139,7 @@ qemu-io -f raw -c 'w -z -u 0 1M' 'nbd+unix://?socket=nozero1.sock' qemu-io -f raw -c 'w -z -u 0 1M' 'nbd+unix://?socket=nozero2.sock' qemu-io -f raw -c 'w -z -u 0 1M' 'nbd+unix://?socket=nozero3.sock' qemu-io -f raw -c 'w -z -u 0 1M' 'nbd+unix://?socket=nozero4.sock' +qemu-io -f raw -c 'w -z -u 0 1M' 'nbd+unix://?socket=nozero5b.sock' # Check for expected ZERO vs. WRITE results grep 'connection=1 Zero' nozero1.log @@ -134,10 +152,16 @@ if grep 'connection=1 Zero' nozero4.log; then echo "filter should have converted zero into write" exit 1 fi +grep 'connection=1 Zero' nozero5b.log +if grep 'connection=1 Zero' nozero5a.log; then + echo "nbdkit should have converted zero into write before nbd plugin" + exit 1 +fi -# Sanity check on contents - all 4 files should read identically +# Sanity check on contents - all 5 files should read identically cmp nozero1.img nozero2.img cmp nozero2.img nozero3.img cmp nozero3.img nozero4.img +cmp nozero4.img nozero5.img # The cleanup() function is called implicitly on exit. -- 2.14.3
Richard W.M. Jones
2018-Mar-23 14:54 UTC
Re: [Libguestfs] [nbdkit PATCH] plugins: Add .can_zero callback
On Thu, Mar 22, 2018 at 05:35:17PM -0500, Eric Blake wrote:> Originally, I thought that since the plugin always emulates > .zero with a fallback to .pwrite, we didn't need to expose the > backend's .can_zero to plugins. But there is another > consideration, as shown at least in the nbd plugin: a plugin > may implement a .zero callback that depends on support from > a remote endpoint. If the remote endpoint doesn't support > zeroes, then the plugin HAS to fail .zero with EOPNOTSUPP to > get the automatic fallback to .pwrite; this is slightly > wasteful to just telling nbdkit to not use the .zero callback > when it won't work. > > At the same time, we still want to advertise zero support to > the client, even if we won't be calling .zero, since handling > NBD_CMD_WRITE_ZEROES allows for less network traffic; if it > is ever truly necessary to avoid advertising to the guest, the > nozero filter can accomplish that task. > > So, this patch exposes a .can_zero callback for plugins, > which can usually be omitted, but which can be used to short- > circuit calls to .zero, with the nbd plugin as the first user; > while plugins.c continues to report true (well, the same result > as .can_write) to the backend regardless of the plugin's > answer, unless the plugin hits an error. > > Testing of the nbd changes (and thus of the general nbdkit > handling of plugin .can_zero) is best done as part of the > nozero filter: using the filter on the server side means the > nbd plugin won't see a zero advertisement, and thus the client > side does not call nbd's .zero, but still advertises zero to > the client. > > Now that the set of feature callbacks is the same for plugins > and filters, we can consolidate the documentation for filters > and merely focus on the slight differences. > > This patch breaks ABI from the earlier exposure of .can_fua to > plugins a few patches ago, but that is okay as there has not > been a release in the meantime (the logical grouping was nicer > this way). Had a release happened, .can_zero would have to > be placed after .zero in struct nbdkit_plugin. Also, in > plugins.c, rearrange get_error() to a more logical location. > > Signed-off-by: Eric Blake <eblake@redhat.com>Yup, looks good, ACK. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Reasonably Related Threads
- [nbdkit PATCH] nozero: Add notrim mode
- [PATCH nbdkit 4/4] tests: Add a helper function which waits for nbdkit to start up.
- [PATCH v2 nbdkit 5/5] tests: Add a helper function which waits for nbdkit to start up.
- [PATCH nbdkit 3/6] file: Make the file= parameter into a magic config key.
- [PATCH nbdkit v2 3/6] file: Make the file= parameter into a magic config key.