Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 0/9] RFC: implement NBD_CMD_CACHE
I'm still working my way through the filters before this series will be complete, but this is enough of a start to at least get some feedback on the idea of implementing another NBD protocol extension. Eric Blake (9): server: Internal hooks for implementing NBD_CMD_CACHE plugins: Add .cache callback file, split: Implement .cache with posix_fadvise nbd: Implement NBD_CMD_CACHE passthrough plugins: Implement no-op .cache for in-memory plugins sh: Implement .cache script callback filters: Add .cache callback test-layers: Test .cache usage blocksize: Implement .cache rounding docs/nbdkit-filter.pod | 27 ++++++++++++++- docs/nbdkit-plugin.pod | 52 ++++++++++++++++++++++++++++ docs/nbdkit-protocol.pod | 10 ++++++ plugins/sh/nbdkit-sh-plugin.pod | 12 ++++++- configure.ac | 3 +- common/protocol/protocol.h | 2 ++ include/nbdkit-filter.h | 8 +++++ include/nbdkit-plugin.h | 2 ++ server/internal.h | 4 +++ filters/blocksize/blocksize.c | 31 ++++++++++++++++- plugins/data/data.c | 10 ++++++ plugins/file/file.c | 22 ++++++++++++ plugins/full/full.c | 12 ++++++- plugins/memory/memory.c | 10 ++++++ plugins/nbd/nbd.c | 22 ++++++++++++ plugins/null/null.c | 12 ++++++- plugins/pattern/pattern.c | 12 ++++++- plugins/random/random.c | 12 ++++++- plugins/sh/sh.c | 40 ++++++++++++++++++++++ plugins/split/split.c | 37 +++++++++++++++++++- plugins/streaming/streaming.c | 11 ++++++ plugins/zero/zero.c | 12 ++++++- server/filters.c | 60 ++++++++++++++++++++++++++++++++- server/plugins.c | 39 +++++++++++++++++++++ server/protocol-handshake.c | 10 +++++- server/protocol.c | 25 ++++++++++++++ tests/test-layers-filter.c | 22 +++++++++++- tests/test-layers-plugin.c | 17 ++++++++++ tests/test-layers.c | 36 ++++++++++++++++++++ tests/test-eflags.sh | 53 +++++++++++++++-------------- 30 files changed, 587 insertions(+), 38 deletions(-) -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 1/9] server: Internal hooks for implementing NBD_CMD_CACHE
Until the next few patches expose and implement new callbacks for filters and plugins, our initial implementation for NBD_CMD_CACHE is to just blindly advertise the feature, and call into .pread with an ignored buffer. Note that for bisection reasons, this patch treats any use of a filter as a forced .can_cache of false to bypass the filter's caching; once all affected filters are patched to handle cache requests correctly, a later patch will then switch the filter default to passthrough for the sake of remaining filters. Signed-off-by: Eric Blake <eblake@redhat.com> --- Note: we could also choose to always advertise caching, but make the nbdkit default version be a no-op rather than call out to .pread. For that matter, maybe a plugin's .can_cache should allow the plugin to choose between a tri-state of no-op, fallback to .pread, or call .cache; I'm not sure which is the saner default for a random plugin compiled against older nbdkit. At least it's fairly easy to write a filter that changes the default from no-op to .pread or from .pread to no-op. Be thinking about that as you review the rest of the series. --- docs/nbdkit-protocol.pod | 10 +++++++ common/protocol/protocol.h | 2 ++ server/internal.h | 4 +++ server/filters.c | 33 ++++++++++++++++++++++- server/plugins.c | 34 ++++++++++++++++++++++++ server/protocol-handshake.c | 10 ++++++- server/protocol.c | 25 +++++++++++++++++ tests/test-eflags.sh | 53 ++++++++++++++++++++----------------- 8 files changed, 144 insertions(+), 27 deletions(-) diff --git a/docs/nbdkit-protocol.pod b/docs/nbdkit-protocol.pod index f706cfd..6b85c14 100644 --- a/docs/nbdkit-protocol.pod +++ b/docs/nbdkit-protocol.pod @@ -152,6 +152,16 @@ when structured replies are in effect. However, the flag is a no-op until we extend the plugin API to allow a fragmented read in the first place. +=item C<NBD_CMD_CACHE> + +Supported in nbdkit E<ge> 1.13.4. + +This protocol extension allows a client to inform the server about +intent to access a portion of the export, to allow the server an +opportunity to cache things appropriately. For plugins that do not +support a particular form of caching, nbdkit performs pread then +ignores the results. + =item Resize Extension I<Not supported>. diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h index c27104c..e938643 100644 --- a/common/protocol/protocol.h +++ b/common/protocol/protocol.h @@ -95,6 +95,7 @@ extern const char *name_of_nbd_flag (int); #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) #define NBD_FLAG_SEND_DF (1 << 7) #define NBD_FLAG_CAN_MULTI_CONN (1 << 8) +#define NBD_FLAG_SEND_CACHE (1 << 10) /* NBD options (new style handshake only). */ extern const char *name_of_nbd_opt (int); @@ -217,6 +218,7 @@ extern const char *name_of_nbd_cmd (int); #define NBD_CMD_DISC 2 /* Disconnect. */ #define NBD_CMD_FLUSH 3 #define NBD_CMD_TRIM 4 +#define NBD_CMD_CACHE 5 #define NBD_CMD_WRITE_ZEROES 6 #define NBD_CMD_BLOCK_STATUS 7 diff --git a/server/internal.h b/server/internal.h index 67fccfc..6414a78 100644 --- a/server/internal.h +++ b/server/internal.h @@ -170,6 +170,7 @@ struct connection { bool can_zero; bool can_fua; bool can_multi_conn; + bool can_cache; bool can_extents; bool using_tls; bool structured_replies; @@ -276,6 +277,7 @@ struct backend { int (*can_extents) (struct backend *, struct connection *conn); int (*can_fua) (struct backend *, struct connection *conn); int (*can_multi_conn) (struct backend *, struct connection *conn); + int (*can_cache) (struct backend *, struct connection *conn); int (*pread) (struct backend *, struct connection *conn, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err); @@ -290,6 +292,8 @@ struct backend { int (*extents) (struct backend *, struct connection *conn, uint32_t count, uint64_t offset, uint32_t flags, struct nbdkit_extents *extents, int *err); + int (*cache) (struct backend *, struct connection *conn, uint32_t count, + uint64_t offset, uint32_t flags, int *err); }; /* plugins.c */ diff --git a/server/filters.c b/server/filters.c index b73e74f..c619fd6 100644 --- a/server/filters.c +++ b/server/filters.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -573,6 +573,18 @@ filter_can_multi_conn (struct backend *b, struct connection *conn) return f->backend.next->can_multi_conn (f->backend.next, conn); } +static int +filter_can_cache (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + + debug ("%s: can_cache", f->name); + + /* FIXME: Default to f->backend.next->can_cache, once all filters + have been audited */ + return 0; +} + static int filter_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, @@ -702,6 +714,23 @@ filter_extents (struct backend *b, struct connection *conn, extents, err); } +static int +filter_cache (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + + assert (flags == 0); + + debug ("%s: cache count=%" PRIu32 " offset=%" PRIu64 " flags=0x%" PRIx32, + f->name, count, offset, flags); + + /* FIXME: Allow filter to rewrite request */ + return f->backend.next->cache (f->backend.next, conn, + count, offset, flags, err); +} + static struct backend filter_functions = { .free = filter_free, .thread_model = filter_thread_model, @@ -726,12 +755,14 @@ static struct backend filter_functions = { .can_extents = filter_can_extents, .can_fua = filter_can_fua, .can_multi_conn = filter_can_multi_conn, + .can_cache = filter_can_cache, .pread = filter_pread, .pwrite = filter_pwrite, .flush = filter_flush, .trim = filter_trim, .zero = filter_zero, .extents = filter_extents, + .cache = filter_cache, }; /* Register and load a filter. */ diff --git a/server/plugins.c b/server/plugins.c index e26d133..cabf543 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -448,6 +448,17 @@ plugin_can_multi_conn (struct backend *b, struct connection *conn) return 0; /* assume false */ } +static int +plugin_can_cache (struct backend *b, struct connection *conn) +{ + assert (connection_get_handle (conn, 0)); + + debug ("can_cache"); + + /* FIXME: return default based on plugin->cache */ + return 0; +} + /* Plugins and filters can call this to set the true errno, in cases * where !errno_is_preserved. */ @@ -693,6 +704,27 @@ plugin_extents (struct backend *b, struct connection *conn, return r; } +static int +plugin_cache (struct backend *b, struct connection *conn, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + int r = -1; + + assert (connection_get_handle (conn, 0)); + assert (!flags); + + debug ("cache count=%" PRIu32 " offset=%" PRIu64, count, offset); + + /* FIXME: assert plugin->cache and call it */ + assert (false); + + if (r == -1) + *err = get_error (p); + return r; +} + static struct backend plugin_functions = { .free = plugin_free, .thread_model = plugin_thread_model, @@ -717,12 +749,14 @@ static struct backend plugin_functions = { .can_extents = plugin_can_extents, .can_fua = plugin_can_fua, .can_multi_conn = plugin_can_multi_conn, + .can_cache = plugin_can_cache, .pread = plugin_pread, .pwrite = plugin_pwrite, .flush = plugin_flush, .trim = plugin_trim, .zero = plugin_zero, .extents = plugin_extents, + .cache = plugin_cache, }; /* Register and load a plugin. */ diff --git a/server/protocol-handshake.c b/server/protocol-handshake.c index 03377a9..af1cd18 100644 --- a/server/protocol-handshake.c +++ b/server/protocol-handshake.c @@ -109,7 +109,7 @@ protocol_compute_eflags (struct connection *conn, uint16_t *flags) conn->can_multi_conn = true; } - /* The result of this is not returned to callers here (or at any + /* The results of the next two are not returned to callers here (or at any * time during the handshake). However it makes sense to do it once * per connection and store the result in the handle anyway. This * protocol_compute_eflags function is a bit misnamed XXX. @@ -120,6 +120,14 @@ protocol_compute_eflags (struct connection *conn, uint16_t *flags) if (fl) conn->can_extents = true; + fl = backend->can_cache (backend, conn); + if (fl == -1) + return -1; + if (fl) + conn->can_cache = true; + /* We emulate cache even when plugin doesn't support it */ + eflags |= NBD_FLAG_SEND_CACHE; + if (conn->structured_replies) eflags |= NBD_FLAG_SEND_DF; diff --git a/server/protocol.c b/server/protocol.c index 01d4c71..69227e5 100644 --- a/server/protocol.c +++ b/server/protocol.c @@ -76,6 +76,7 @@ validate_request (struct connection *conn, /* Validate cmd, offset, count. */ switch (cmd) { case NBD_CMD_READ: + case NBD_CMD_CACHE: case NBD_CMD_WRITE: case NBD_CMD_TRIM: case NBD_CMD_WRITE_ZEROES: @@ -254,6 +255,30 @@ handle_request (struct connection *conn, return err; break; + case NBD_CMD_CACHE: + /* The other backend methods don't check can_*. That is because + * those methods are implicitly suppressed by returning fewer + * eflags to the client. However the eflag for cache is always + * sent, because we emulate it here when the plugin lacks it. + */ + if (conn->can_cache) { + if (backend->cache (backend, conn, count, offset, 0, &err) == -1) + return err; + } + else { + static char buf[MAX_REQUEST_SIZE]; /* data sink, never read */ + uint32_t limit; + + while (count) { + limit = MIN (count, sizeof buf); + if (backend->pread (backend, conn, buf, limit, offset, flags, + &err) == -1) + return err; + count -= limit; + } + } + break; + case NBD_CMD_WRITE_ZEROES: if (!(flags & NBD_CMD_FLAG_NO_HOLE)) f |= NBDKIT_FLAG_MAY_TRIM; diff --git a/tests/test-eflags.sh b/tests/test-eflags.sh index 84dcfe8..0234aca 100755 --- a/tests/test-eflags.sh +++ b/tests/test-eflags.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # nbdkit -# Copyright (C) 2018 Red Hat Inc. +# Copyright (C) 2018-2019 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -94,8 +94,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # -r @@ -108,8 +108,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # can_write=true @@ -126,8 +126,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "expected HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # -r @@ -144,8 +144,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # can_write=false @@ -164,8 +164,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # -r @@ -185,8 +185,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # can_write=true @@ -201,8 +201,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "expected HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # can_write=true @@ -217,8 +217,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "expected HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # -r @@ -234,8 +234,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # can_write=true @@ -250,8 +250,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF )) ] || - fail "expected HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # -r @@ -269,8 +269,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE" #---------------------------------------------------------------------- # -r @@ -284,5 +284,8 @@ case "$1" in esac EOF -[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN )) ] || - fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN" +[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN|SEND_CACHE )) ] || + fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN|SEND_CACHE" + +# NBD_FLAG_SEND_CACHE is always set even if can_cache returns false, because +# nbdkit reckons it can emulate caching using pread. -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 2/9] plugins: Add .cache callback
Make it possible for plugins to provide an alternative behavior for NBD_CMD_CACHE rather than just discarding the buffer of .pread. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/nbdkit-plugin.pod | 52 +++++++++++++++++++++++++++++++++++++++++ include/nbdkit-plugin.h | 2 ++ server/plugins.c | 17 +++++++++----- 3 files changed, 65 insertions(+), 6 deletions(-) diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index 318be4c..0657151 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -620,6 +620,31 @@ with an error message and return C<-1>. This callback is not required. If omitted, then we return false. +=head2 C<.can_cache> + + int can_cache (void *handle); + +This is called during the option negotiation phase to find out if the +plugin supports a cache operation. The nature of the caching is +unspecified (including whether there are limits on how much can be +cached at once, and whether writes to a cached region have +write-through or write-back semantics), but the command exists to let +clients issue a hint to the server that they will be accessing that +region of the export. + +If there is an error, C<.can_cache> should call C<nbdkit_error> with +an error message and return C<-1>. Since the NBD protocol has +reserved the right to add future flags to mandate specific caching +capabilities, this function should only return C<1> on success (in the +future, other positive results may be treated as a bitmask of +supported flags). + +This callback is not required. If omitted, then we return true iff a +C<.cache> callback has been defined. Note that nbdkit always +advertises caching capabilities to the client regardless of this +callback; if this callback returns false, nbdkit handles a client +cache request by calling C<.pread> and ignoring the resulting data. + =head2 C<.pread> int pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -814,6 +839,33 @@ C<nbdkit_extent_add> returns C<0> on success or C<-1> on failure. On failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been called. C<errno> will be set to a suitable value. +=head2 C<.cache> + + int cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags); + +During the data serving phase, this callback is used to give the +plugin a hint that the client intends to make further accesses to the +given region of the export. The nature of caching is not specified +further by the NBD specification (for example, a server may place +limits on how much may be cached at once, and there is no way to +control if writes to a cached area have write-through or write-back +semantics). In fact, the cache command can always fail and still be +compliant, and success might not guarantee a performance gain. If +this callback is omitted, nbdkit defaults to providing cache semantics +obtained by calling C<.pread> over the same region and ignoring the +results. + +This function will not be called if C<.can_cache> returned false. The +parameter C<flags> exists in case of future NBD protocol extensions; +at this time, it will be 0 on input. A plugin must fail this function +if C<flags> includes an unrecognized flag, as that may indicate a +requirement that the plugin comply must with a specific caching +semantic. + +If there is an error, C<.cache> should call C<nbdkit_error> with an +error message, and C<nbdkit_set_error> to record an appropriate error +(unless C<errno> is sufficient), then return C<-1>. + =head1 THREADS Each nbdkit plugin must declare its thread safety model by defining diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index 54b4ce2..e9b1808 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -128,6 +128,8 @@ struct nbdkit_plugin { int (*can_extents) (void *handle); int (*extents) (void *handle, uint32_t count, uint64_t offset, uint32_t flags, struct nbdkit_extents *extents); + int (*can_cache) (void *handle); + int (*cache) (void *handle, uint32_t count, uint64_t offset, uint32_t flags); }; extern void nbdkit_set_error (int err); diff --git a/server/plugins.c b/server/plugins.c index cabf543..947fe79 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -201,6 +201,8 @@ plugin_dump_fields (struct backend *b) HAS (can_multi_conn); HAS (can_extents); HAS (extents); + HAS (can_cache); + HAS (cache); #undef HAS /* Custom fields. */ @@ -451,12 +453,16 @@ plugin_can_multi_conn (struct backend *b, struct connection *conn) static int plugin_can_cache (struct backend *b, struct connection *conn) { + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + assert (connection_get_handle (conn, 0)); debug ("can_cache"); - /* FIXME: return default based on plugin->cache */ - return 0; + if (p->plugin.can_cache) + return p->plugin.can_cache (connection_get_handle (conn, 0)); + else + return p->plugin.cache != NULL; } /* Plugins and filters can call this to set the true errno, in cases @@ -710,16 +716,15 @@ plugin_cache (struct backend *b, struct connection *conn, int *err) { struct backend_plugin *p = container_of (b, struct backend_plugin, backend); - int r = -1; + int r; assert (connection_get_handle (conn, 0)); assert (!flags); + assert (p->plugin.cache); debug ("cache count=%" PRIu32 " offset=%" PRIu64, count, offset); - /* FIXME: assert plugin->cache and call it */ - assert (false); - + r = p->plugin.cache (connection_get_handle (conn, 0), count, offset, flags); if (r == -1) *err = get_error (p); return r; -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 3/9] file, split: Implement .cache with posix_fadvise
Since NBD_CMD_CACHE is already advisory, let's use an advisory kernel interface to implement it ;) When posix_fadvise() is not present, it is likely that nbdkit's fallback to .pread will actually have a similar benefit in populating the filesystem cache, since we aren't using O_DIRECT to avoid that cache. Signed-off-by: Eric Blake <eblake@redhat.com> --- configure.ac | 3 ++- plugins/file/file.c | 22 ++++++++++++++++++++++ plugins/split/split.c | 37 ++++++++++++++++++++++++++++++++++++- 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 58031f3..06124c5 100644 --- a/configure.ac +++ b/configure.ac @@ -195,7 +195,8 @@ dnl Check for functions in libc, all optional. AC_CHECK_FUNCS([\ fdatasync \ get_current_dir_name \ - mkostemp]) + mkostemp \ + posix_fadvise]) dnl Check whether printf("%m") works AC_CACHE_CHECK([whether the printf family supports %m], diff --git a/plugins/file/file.c b/plugins/file/file.c index f0ac23b..85b033b 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -608,6 +608,25 @@ file_extents (void *handle, uint32_t count, uint64_t offset, } #endif /* SEEK_HOLE */ +#if HAVE_POSIX_FADVISE +/* Caching. */ +static int +file_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + int r; + + /* Cache is advisory, we don't care if this fails */ + r = posix_fadvise (h->fd, offset, count, POSIX_FADV_WILLNEED); + if (r) { + errno = r; + nbdkit_error ("posix_fadvise: %m"); + return -1; + } + return 0; +} +#endif /* HAVE_POSIX_FADVISE */ + static struct nbdkit_plugin plugin = { .name = "file", .longname = "nbdkit file plugin", @@ -632,6 +651,9 @@ static struct nbdkit_plugin plugin = { #ifdef SEEK_HOLE .can_extents = file_can_extents, .extents = file_extents, +#endif +#if HAVE_POSIX_FADVISE + .cache = file_cache, #endif .errno_is_preserved = 1, }; diff --git a/plugins/split/split.c b/plugins/split/split.c index cf2b2c7..cef2d51 100644 --- a/plugins/split/split.c +++ b/plugins/split/split.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -277,6 +277,38 @@ split_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) return 0; } +#if HAVE_POSIX_FADVISE +/* Caching. */ +static int +split_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + + /* Cache is advisory, we don't care if this fails */ + while (count > 0) { + struct file *file = get_file (h, offset); + uint64_t foffs = offset - file->offset; + uint64_t max; + int r; + + max = file->size - foffs; + if (max > count) + max = count; + + r = posix_fadvise (file->fd, offset, max, POSIX_FADV_WILLNEED); + if (r) { + errno = r; + nbdkit_error ("posix_fadvise: %m"); + return -1; + } + count -= r; + offset += r; + } + + return 0; +} +#endif /* HAVE_POSIX_FADVISE */ + static struct nbdkit_plugin plugin = { .name = "split", .version = PACKAGE_VERSION, @@ -289,6 +321,9 @@ static struct nbdkit_plugin plugin = { .get_size = split_get_size, .pread = split_pread, .pwrite = split_pwrite, +#if HAVE_POSIX_FADVISE + .cache = split_cache, +#endif /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 4/9] nbd: Implement NBD_CMD_CACHE passthrough
In the nbd plugin, if the remote server supports any form of caching, we should utilize that rather than nbdkit's fallback to .pread. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 821f256..71549d7 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -1151,6 +1151,14 @@ nbd_can_multi_conn (void *handle) return h->flags & NBD_FLAG_CAN_MULTI_CONN; } +static int +nbd_can_cache (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_SEND_CACHE; +} + static int nbd_can_extents (void *handle) { @@ -1245,6 +1253,18 @@ nbd_extents (void *handle, uint32_t count, uint64_t offset, return c < 0 ? c : nbd_reply (h, c); } +/* Cache a portion of the file. */ +static int +nbd_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + struct handle *h = handle; + int c; + + assert (!flags); + c = nbd_request (h, 0, NBD_CMD_CACHE, offset, count); + return c < 0 ? c : nbd_reply (h, c); +} + static struct nbdkit_plugin plugin = { .name = "nbd", .longname = "nbdkit nbd plugin", @@ -1264,12 +1284,14 @@ static struct nbdkit_plugin plugin = { .can_fua = nbd_can_fua, .can_multi_conn = nbd_can_multi_conn, .can_extents = nbd_can_extents, + .can_cache = nbd_can_cache, .pread = nbd_pread, .pwrite = nbd_pwrite, .zero = nbd_zero, .flush = nbd_flush, .trim = nbd_trim, .extents = nbd_extents, + .cache = nbd_cache, .errno_is_preserved = 1, }; -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 5/9] plugins: Implement no-op .cache for in-memory plugins
For our plugins which have no backing file but generate everything on the fly or store things in memory, falling back to .pread on a cache request is just wasted work. Implement a no-op .cache callback for the drivers where there is no benefit to trying to cache anything. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/data/data.c | 10 ++++++++++ plugins/full/full.c | 12 +++++++++++- plugins/memory/memory.c | 10 ++++++++++ plugins/null/null.c | 12 +++++++++++- plugins/pattern/pattern.c | 12 +++++++++++- plugins/random/random.c | 12 +++++++++++- plugins/streaming/streaming.c | 11 +++++++++++ plugins/zero/zero.c | 12 +++++++++++- 8 files changed, 86 insertions(+), 5 deletions(-) diff --git a/plugins/data/data.c b/plugins/data/data.c index 55380c6..aaa3d2d 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -377,6 +377,15 @@ data_extents (void *handle, uint32_t count, uint64_t offset, return sparse_array_extents (sa, count, offset, extents); } +/* Cache. */ +static int +data_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "data", .version = PACKAGE_VERSION, @@ -394,6 +403,7 @@ static struct nbdkit_plugin plugin = { .zero = data_zero, .trim = data_trim, .extents = data_extents, + .cache = data_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/full/full.c b/plugins/full/full.c index 7661856..51a9d67 100644 --- a/plugins/full/full.c +++ b/plugins/full/full.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -137,6 +137,15 @@ full_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO); } +/* Cache. */ +static int +full_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + /* Note that we don't need to handle flush: If there has been previous * write then we have already returned an error. If there have been * no previous writes then flush can be ignored. @@ -156,6 +165,7 @@ static struct nbdkit_plugin plugin = { .zero = full_zero, .trim = full_trim, .extents = full_extents, + .cache = full_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index 90fa99e..baa29e2 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -172,6 +172,15 @@ memory_extents (void *handle, uint32_t count, uint64_t offset, return sparse_array_extents (sa, count, offset, extents); } +/* Cache. */ +static int +memory_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "memory", .version = PACKAGE_VERSION, @@ -189,6 +198,7 @@ static struct nbdkit_plugin plugin = { .zero = memory_zero, .trim = memory_trim, .extents = memory_extents, + .cache = memory_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/null/null.c b/plugins/null/null.c index 518b63b..5e40868 100644 --- a/plugins/null/null.c +++ b/plugins/null/null.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -140,6 +140,15 @@ null_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO); } +/* Cache. */ +static int +null_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "null", .version = PACKAGE_VERSION, @@ -155,6 +164,7 @@ static struct nbdkit_plugin plugin = { .can_fua = null_can_fua, .flush = null_flush, .extents = null_extents, + .cache = null_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/pattern/pattern.c b/plugins/pattern/pattern.c index 115bd96..e25da36 100644 --- a/plugins/pattern/pattern.c +++ b/plugins/pattern/pattern.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -117,6 +117,15 @@ pattern_pread (void *handle, void *buf, uint32_t count, uint64_t offset, return 0; } +/* Cache. */ +static int +pattern_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "pattern", .version = PACKAGE_VERSION, @@ -127,6 +136,7 @@ static struct nbdkit_plugin plugin = { .get_size = pattern_get_size, .can_multi_conn = pattern_can_multi_conn, .pread = pattern_pread, + .cache = pattern_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/random/random.c b/plugins/random/random.c index 7fb42c8..6219169 100644 --- a/plugins/random/random.c +++ b/plugins/random/random.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -146,6 +146,15 @@ random_pread (void *handle, void *buf, uint32_t count, uint64_t offset, return 0; } +/* Cache. */ +static int +random_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "random", .version = PACKAGE_VERSION, @@ -157,6 +166,7 @@ static struct nbdkit_plugin plugin = { .get_size = random_get_size, .can_multi_conn = random_can_multi_conn, .pread = random_pread, + .cache = random_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/plugins/streaming/streaming.c b/plugins/streaming/streaming.c index 4ca3e76..505ed91 100644 --- a/plugins/streaming/streaming.c +++ b/plugins/streaming/streaming.c @@ -246,6 +246,16 @@ streaming_pread (void *handle, void *buf, uint32_t count, uint64_t offset) return -1; } +/* Cache. */ +static int +streaming_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* nbdkit's default of falling back to .pread is pointless: it will + * fail for addresses already written, and waste a memset() for + * addresses not yet reached. Treat this as a no-op instead. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "streaming", .longname = "nbdkit streaming plugin", @@ -259,6 +269,7 @@ static struct nbdkit_plugin plugin = { .get_size = streaming_get_size, .pwrite = streaming_pwrite, .pread = streaming_pread, + .cache = streaming_cache, .errno_is_preserved = 1, }; diff --git a/plugins/zero/zero.c b/plugins/zero/zero.c index 49ce08e..be8fc2a 100644 --- a/plugins/zero/zero.c +++ b/plugins/zero/zero.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2018 Red Hat Inc. + * Copyright (C) 2017-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -76,6 +76,15 @@ zero_pread (void *handle, void *buf, uint32_t count, uint64_t offset, return -1; } +/* Cache. */ +static int +zero_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + /* Everything is already in memory, falling back to .pread is + actually slower than treating this as a no-op. */ + return 0; +} + static struct nbdkit_plugin plugin = { .name = "zero", .version = PACKAGE_VERSION, @@ -83,6 +92,7 @@ static struct nbdkit_plugin plugin = { .open = zero_open, .get_size = zero_get_size, .pread = zero_pread, + .cache = zero_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 6/9] sh: Implement .cache script callback
It's easy to expose new callbacks to sh plugins. I have no idea how any plugin would actually usefully cache anything in shell (you can't really store it in a shell variable, since a new shell process is started for each command); but if nothing else, this allows a user to implement .can_cache returning true to bypass nbdkit's normal fallback to .pread. The shell plugin, coupled with Rich's work on libnbd as a client-side library for actually exercising calls to NBD_CMD_CACHE, will be a useful way to prove that cache commands even make it through the stack. (Remember, qemu 3.0 was released with a fatally flawed NBD_CMD_CACHE server implementation, because there were no open source clients at the time that could actually send the command to test the server with). Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/sh/nbdkit-sh-plugin.pod | 12 +++++++++- plugins/sh/sh.c | 40 +++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/plugins/sh/nbdkit-sh-plugin.pod b/plugins/sh/nbdkit-sh-plugin.pod index 8af88b4..bfd58ab 100644 --- a/plugins/sh/nbdkit-sh-plugin.pod +++ b/plugins/sh/nbdkit-sh-plugin.pod @@ -218,9 +218,11 @@ This method is required. =item C<can_extents> +=item C<can_cache> + Unlike in other languages, you B<must> provide the C<can_*> methods otherwise they are assumed to all return false and your C<pwrite>, -C<flush>, C<trim>, C<zero> and C<extents> methods will never be +C<flush>, C<trim>, C<zero>, C<extents>, and C<cache> methods will never be called. The reason for this is obscure: In other languages we can detect if (eg) a C<pwrite> method is defined and synthesize an appropriate response if no actual C<can_write> method is defined. @@ -232,6 +234,7 @@ possible with this plugin. /path/to/script can_trim <handle> /path/to/script can_zero <handle> /path/to/script can_extents <handle> + /path/to/script can_cache <handle> The script should exit with code C<0> for true or code C<3> for false. @@ -334,6 +337,13 @@ Unlike in other languages, if you provide an C<extents> method you B<must> also provide a C<can_extents> method which exits with code C<0> (true). +=item C<cache> + + /path/to/script cache <handle> <count> <offset> + +Unlike in other languages, if you provide a C<cache> method you B<must> +also provide a C<can_cache> method which exits with code C<0> (true). + =back =head2 Missing callbacks diff --git a/plugins/sh/sh.c b/plugins/sh/sh.c index a5beb57..7aded44 100644 --- a/plugins/sh/sh.c +++ b/plugins/sh/sh.c @@ -578,6 +578,12 @@ sh_can_multi_conn (void *handle) return boolean_method (handle, "can_multi_conn"); } +static int +sh_can_cache (void *handle) +{ + return boolean_method (handle, "can_cache"); +} + static int sh_flush (void *handle, uint32_t flags) { @@ -782,6 +788,38 @@ sh_extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, } } +static int +sh_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags) +{ + char *h = handle; + char cbuf[32], obuf[32]; + const char *args[] = { script, "cache", h, cbuf, obuf, NULL }; + + snprintf (cbuf, sizeof cbuf, "%" PRIu32, count); + snprintf (obuf, sizeof obuf, "%" PRIu64, offset); + assert (!flags); + + switch (call (args)) { + case OK: + return 0; + + case MISSING: + /* Ignore lack of cache callback. */ + return 0; + + case ERROR: + return -1; + + case RET_FALSE: + nbdkit_error ("%s: %s method returned unexpected code (3/false)", + script, "cache"); + errno = EIO; + return -1; + + default: abort (); + } +} + #define sh_config_help \ "script=<FILENAME> (required) The shell script to run.\n" \ "[other arguments may be used by the plugin that you load]" @@ -812,6 +850,7 @@ static struct nbdkit_plugin plugin = { .can_extents = sh_can_extents, .can_fua = sh_can_fua, .can_multi_conn = sh_can_multi_conn, + .can_cache = sh_can_cache, .pread = sh_pread, .pwrite = sh_pwrite, @@ -819,6 +858,7 @@ static struct nbdkit_plugin plugin = { .trim = sh_trim, .zero = sh_zero, .extents = sh_extents, + .cache = sh_cache, .errno_is_preserved = 1, }; -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 7/9] filters: Add .cache callback
Make it possible for filters to adjust the behavior for NBD_CMD_CACHE. To avoid any 'git bisect' breakage, this patch defaults .can_cache to false until all necessary filters have been patched first. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/nbdkit-filter.pod | 27 ++++++++++++++++++++++++++- include/nbdkit-filter.h | 8 ++++++++ server/filters.c | 39 +++++++++++++++++++++++++++++++++------ 3 files changed, 67 insertions(+), 7 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 6aeaa7b..787333a 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -356,6 +356,8 @@ calls. =head2 C<.can_multi_conn> +=head2 C<.can_cache> + int (*can_write) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_flush) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -373,6 +375,8 @@ calls. void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_cache) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); These intercept the corresponding plugin methods, and control feature bits advertised to the client. @@ -597,6 +601,27 @@ Returns the number of extents in the list. Returns a copy of the C<i>'th extent. +=head2 C<.cache> + + int (*cache) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, + uint32_t flags, int *err); + +This intercepts the plugin C<.cache> method and can be used to modify +cache requests. + +This function will not be called if C<.can_cache> returned false; in +turn, the filter should not call C<next_ops-E<gt>cache> if +C<next_ops-E<gt>can_cache> did not return true. + +The parameter C<flags> exists in case of future NBD protocol +extensions; at this time, it will be 0 on input, and the filter should +not pass any flags to C<next_ops-E<gt>cache>. + +If there is an error, C<.cache> should call C<nbdkit_error> with an +error message B<and> return -1 with C<err> set to the positive errno +value to return to the client. + =head1 ERROR HANDLING If there is an error in the filter itself, the filter should call @@ -708,4 +733,4 @@ Richard W.M. Jones =head1 COPYRIGHT -Copyright (C) 2013-2018 Red Hat Inc. +Copyright (C) 2013-2019 Red Hat Inc. diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 9b6cd6e..5893dd8 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -74,6 +74,7 @@ struct nbdkit_next_ops { int (*can_extents) (void *nxdata); int (*can_fua) (void *nxdata); int (*can_multi_conn) (void *nxdata); + int (*can_cache) (void *nxdata); int (*pread) (void *nxdata, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err); @@ -87,6 +88,8 @@ struct nbdkit_next_ops { int *err); int (*extents) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, struct nbdkit_extents *extents, int *err); + int (*cache) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, + int *err); }; struct nbdkit_filter { @@ -142,6 +145,8 @@ struct nbdkit_filter { void *handle); int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_cache) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*pread) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, uint32_t count, uint64_t offset, @@ -161,6 +166,9 @@ struct nbdkit_filter { int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t count, uint64_t offset, uint32_t flags, struct nbdkit_extents *extents, int *err); + int (*cache) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + int *err); }; #define NBDKIT_REGISTER_FILTER(filter) \ diff --git a/server/filters.c b/server/filters.c index c619fd6..a48f67e 100644 --- a/server/filters.c +++ b/server/filters.c @@ -329,6 +329,13 @@ next_can_multi_conn (void *nxdata) return b_conn->b->can_multi_conn (b_conn->b, b_conn->conn); } +static int +next_can_cache (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_cache (b_conn->b, b_conn->conn); +} + static int next_pread (void *nxdata, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err) @@ -379,6 +386,15 @@ next_extents (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags, extents, err); } +static int +next_cache (void *nxdata, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->cache (b_conn->b, b_conn->conn, count, offset, flags, + err); +} + static struct nbdkit_next_ops next_ops = { .get_size = next_get_size, .can_write = next_can_write, @@ -389,12 +405,14 @@ static struct nbdkit_next_ops next_ops = { .can_extents = next_can_extents, .can_fua = next_can_fua, .can_multi_conn = next_can_multi_conn, + .can_cache = next_can_cache, .pread = next_pread, .pwrite = next_pwrite, .flush = next_flush, .trim = next_trim, .zero = next_zero, .extents = next_extents, + .cache = next_cache, }; static int @@ -577,12 +595,16 @@ static int filter_can_cache (struct backend *b, struct connection *conn) { struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; debug ("%s: can_cache", f->name); - /* FIXME: Default to f->backend.next->can_cache, once all filters - have been audited */ - return 0; + if (f->filter.can_cache) + return f->filter.can_cache (&next_ops, &nxdata, handle); + else + return 0; /* FIXME - allow passthrough once all filters are audited */ + return f->backend.next->can_cache (f->backend.next, conn); } static int @@ -720,15 +742,20 @@ filter_cache (struct backend *b, struct connection *conn, uint32_t flags, int *err) { struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; assert (flags == 0); debug ("%s: cache count=%" PRIu32 " offset=%" PRIu64 " flags=0x%" PRIx32, f->name, count, offset, flags); - /* FIXME: Allow filter to rewrite request */ - return f->backend.next->cache (f->backend.next, conn, - count, offset, flags, err); + if (f->filter.cache) + return f->filter.cache (&next_ops, &nxdata, handle, + count, offset, flags, err); + else + return f->backend.next->cache (f->backend.next, conn, + count, offset, flags, err); } static struct backend filter_functions = { -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 8/9] test-layers: Test .cache usage
This also forms the first instance that I am aware of where an open source client actually sends NBD_CMD_CACHE to the server - even if the client is just a test app (the nbd plugin doesn't count - it won't send to the server unless a client sends to the plugin first). Signed-off-by: Eric Blake <eblake@redhat.com> --- tests/test-layers-filter.c | 22 +++++++++++++++++++++- tests/test-layers-plugin.c | 17 +++++++++++++++++ tests/test-layers.c | 36 ++++++++++++++++++++++++++++++++++++ 3 files changed, 74 insertions(+), 1 deletion(-) diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c index a8f4723..bd063bd 100644 --- a/tests/test-layers-filter.c +++ b/tests/test-layers-filter.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -184,6 +184,15 @@ test_layers_filter_can_extents (struct nbdkit_next_ops *next_ops, return next_ops->can_extents (nxdata); } +static int +test_layers_filter_can_cache (struct nbdkit_next_ops *next_ops, + void *nxdata, + void *handle) +{ + DEBUG_FUNCTION; + return next_ops->can_cache (nxdata); +} + static int test_layers_filter_pread (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, @@ -241,6 +250,15 @@ test_layers_filter_extents (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->extents (nxdata, count, offset, flags, extents, err); } +static int +test_layers_filter_cache (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + DEBUG_FUNCTION; + return next_ops->cache (nxdata, count, offset, flags, err); +} + static struct nbdkit_filter filter = { .name = "testlayers" layer, .version = PACKAGE_VERSION, @@ -262,12 +280,14 @@ static struct nbdkit_filter filter = { .can_fua = test_layers_filter_can_fua, .can_multi_conn = test_layers_filter_can_multi_conn, .can_extents = test_layers_filter_can_extents, + .can_cache = test_layers_filter_can_cache, .pread = test_layers_filter_pread, .pwrite = test_layers_filter_pwrite, .flush = test_layers_filter_flush, .trim = test_layers_filter_trim, .zero = test_layers_filter_zero, .extents = test_layers_filter_extents, + .cache = test_layers_filter_cache, }; NBDKIT_REGISTER_FILTER(filter) diff --git a/tests/test-layers-plugin.c b/tests/test-layers-plugin.c index f9b2014..981641e 100644 --- a/tests/test-layers-plugin.c +++ b/tests/test-layers-plugin.c @@ -143,6 +143,13 @@ test_layers_plugin_can_multi_conn (void *handle) return 1; } +static int +test_layers_plugin_can_cache (void *handle) +{ + DEBUG_FUNCTION; + return 1; +} + static int test_layers_plugin_can_extents (void *handle) { @@ -201,6 +208,14 @@ test_layers_plugin_extents (void *handle, return nbdkit_add_extent (extents, offset, count, 0); } +static int +test_layers_plugin_cache (void *handle, + uint32_t count, uint64_t offset, uint32_t flags) +{ + DEBUG_FUNCTION; + return 0; +} + static struct nbdkit_plugin plugin = { .name = "testlayersplugin", .version = PACKAGE_VERSION, @@ -220,12 +235,14 @@ static struct nbdkit_plugin plugin = { .can_fua = test_layers_plugin_can_fua, .can_multi_conn = test_layers_plugin_can_multi_conn, .can_extents = test_layers_plugin_can_extents, + .can_cache = test_layers_plugin_can_cache, .pread = test_layers_plugin_pread, .pwrite = test_layers_plugin_pwrite, .flush = test_layers_plugin_flush, .trim = test_layers_plugin_trim, .zero = test_layers_plugin_zero, .extents = test_layers_plugin_extents, + .cache = test_layers_plugin_cache, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. */ diff --git a/tests/test-layers.c b/tests/test-layers.c index 627e4ec..a820ba5 100644 --- a/tests/test-layers.c +++ b/tests/test-layers.c @@ -361,6 +361,12 @@ main (int argc, char *argv[]) "filter1: test_layers_filter_can_extents", "test_layers_plugin_can_extents", NULL); + log_verify_seen_in_order + ("filter3: test_layers_filter_can_cache", + "filter2: test_layers_filter_can_cache", + "filter1: test_layers_filter_can_cache", + "test_layers_plugin_can_cache", + NULL); fprintf (stderr, "%s: protocol connected\n", program_name); @@ -526,6 +532,36 @@ main (int argc, char *argv[]) "test_layers_plugin_zero", NULL); + request.type = htobe16 (NBD_CMD_CACHE); + request.offset = htobe64 (0); + request.count = htobe32 (512); + request.flags = htobe16 (0); + if (send (sock, &request, sizeof request, 0) != sizeof request) { + perror ("send: NBD_CMD_CACHE"); + exit (EXIT_FAILURE); + } + if (recv (sock, &reply, sizeof reply, MSG_WAITALL) != sizeof reply) { + perror ("recv: NBD_CMD_CACHE"); + exit (EXIT_FAILURE); + } + if (reply.error != NBD_SUCCESS) { + fprintf (stderr, "%s: NBD_CMD_CACHE failed with %d\n", + program_name, reply.error); + exit (EXIT_FAILURE); + } + + sleep (1); + log_verify_seen_in_order + ("testlayersfilter3: cache count=512 offset=0 flags=0x0", + "filter3: test_layers_filter_cache", + "testlayersfilter2: cache count=512 offset=0 flags=0x0", + "filter2: test_layers_filter_cache", + "testlayersfilter1: cache count=512 offset=0 flags=0x0", + "filter1: test_layers_filter_cache", + "testlayersplugin: debug: cache count=512 offset=0", + "test_layers_plugin_cache", + NULL); + /* XXX We should test NBD_CMD_BLOCK_STATUS here. However it * requires that we negotiate structured replies and base:allocation * in the handshake, and the format of the reply is more complex -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 9/9] blocksize: Implement .cache rounding
Rely on .can_cache passthrough to imply that our .cache won't be called unless the plugin also has .cache. [Technically, that won't happen until a later patch flips the default in filters.c]. Round the cache request out, to cache the same range as would otherwise be passed to the plugin's .pread if we had set .can_cache to false. Fortunately, when maxlen is larger than maxblock, and the plugin supports .cache, this results in a lot fewer calls into the plugin than the .pread fallback. Oddly enough, a client can submit an unaligned request for just under 4G of caching where our rounding would overflow a 32-bit integer, so our rounding has to use a 64-bit temporary. Signed-off-by: Eric Blake <eblake@redhat.com> --- filters/blocksize/blocksize.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c index 4f3e9a3..2d28222 100644 --- a/filters/blocksize/blocksize.c +++ b/filters/blocksize/blocksize.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2019 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -389,6 +389,34 @@ blocksize_extents (struct nbdkit_next_ops *next_ops, void *nxdata, flags, extents, err); } +static int +blocksize_cache (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offs, uint32_t flags, + int *err) +{ + uint32_t limit; + uint64_t remaining = count; /* Rounding out could exceed 32 bits */ + + /* Unaligned head */ + limit = offs & (minblock - 1); + remaining += limit; + offs -= limit; + + /* Unaligned tail */ + remaining = ROUND_UP (remaining, minblock); + + /* Aligned body */ + while (remaining) { + limit = MIN (maxdata, remaining); + if (next_ops->cache (nxdata, limit, offs, flags, err) == -1) + return -1; + offs += limit; + remaining -= limit; + } + + return 0; +} + static struct nbdkit_filter filter = { .name = "blocksize", .longname = "nbdkit blocksize filter", @@ -404,6 +432,7 @@ static struct nbdkit_filter filter = { .trim = blocksize_trim, .zero = blocksize_zero, .extents = blocksize_extents, + .cache = blocksize_cache, }; NBDKIT_REGISTER_FILTER(filter) -- 2.20.1
Richard W.M. Jones
2019-May-13 07:56 UTC
Re: [Libguestfs] [nbdkit PATCH 1/9] server: Internal hooks for implementing NBD_CMD_CACHE
On Thu, May 09, 2019 at 10:03:34PM -0500, Eric Blake wrote:> Until the next few patches expose and implement new callbacks for > filters and plugins, our initial implementation for NBD_CMD_CACHE is > to just blindly advertise the feature, and call into .pread with an > ignored buffer. > > Note that for bisection reasons, this patch treats any use of a filter > as a forced .can_cache of false to bypass the filter's caching; once > all affected filters are patched to handle cache requests correctly, a > later patch will then switch the filter default to passthrough for the > sake of remaining filters. > > Signed-off-by: Eric Blake <eblake@redhat.com> > > --- > > Note: we could also choose to always advertise caching, but make the > nbdkit default version be a no-op rather than call out to .pread. For > that matter, maybe a plugin's .can_cache should allow the plugin to > choose between a tri-state of no-op, fallback to .pread, or call > .cache; I'm not sure which is the saner default for a random plugin > compiled against older nbdkit. At least it's fairly easy to write a > filter that changes the default from no-op to .pread or from .pread to > no-op. Be thinking about that as you review the rest of the series.I'm worried that we end up in the same situation that we do with .zero: because we emulate it, we end up with a slow zero support, whereas clients [well, Nir S. mainly] actually want NBD_FLAG_SEND_WRITE_ZEROES == *fast* zeroing is available. In particular in the NBD_CMD_CACHE case: for some plugins emulating this behind the scenes with pread makes some sense. eg. It works for nbdkit-file-plugin because it will prefetch the data into Linux's page cache. However if a plugin makes a network access, eg. nbdkit-vddk-plugin, that can be both very expensive and has no prefetching benefit. Therefore I do think we should default to ignoring NBD_CMD_CACHE commands and *not* advertising the feature, except when the plugin says that it is able to handle it. For tri-stating .can_cache: yes that makes sense, because there is moderate difficulty for a plugin to emulate .cache using .pread. The plugin has to at least allocate a dummy buffer and throw it away, and we could probably do something more efficient than that if we implement an emulate mode in the server. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2019-May-13 08:01 UTC
Re: [Libguestfs] [nbdkit PATCH 6/9] sh: Implement .cache script callback
For this plugin, if a tri-state can_cache is implemented, the code from sh_can_fua could be reused: https://github.com/libguestfs/nbdkit/blob/7a4f38f98920a1192e5c50ea9bd212e759e3f4e7/plugins/sh/sh.c#L530 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Possibly Parallel Threads
- [nbdkit PATCH v2 00/24] implement NBD_CMD_CACHE
- cross-project patches: Add NBD Fast Zero support
- [nbdkit PATCH 0/2] More on .thread_model
- [nbdkit PATCH 1/9] server: Internal hooks for implementing NBD_CMD_CACHE
- [PATCH nbdkit 0/3] tests: Test export flags (eflags).