Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 0/9] RFC: implement NBD_CMD_CACHE
I'm still working my way through the filters before this series will be complete, but this is enough of a start to at least get some feedback on the idea of implementing another NBD protocol extension. Eric Blake (9): server: Internal hooks for implementing NBD_CMD_CACHE plugins: Add .cache callback file, split: Implement .cache with posix_fadvise nbd: Implement NBD_CMD_CACHE passthrough plugins: Implement no-op .cache for in-memory plugins sh: Implement .cache script callback filters: Add .cache callback test-layers: Test .cache usage blocksize: Implement .cache rounding docs/nbdkit-filter.pod | 27 ++++++++++++++- docs/nbdkit-plugin.pod | 52 ++++++++++++++++++++++++++++ docs/nbdkit-protocol.pod | 10 ++++++ plugins/sh/nbdkit-sh-plugin.pod | 12 ++++++- configure.ac | 3 +- common/protocol/protocol.h | 2 ++ include/nbdkit-filter.h | 8 +++++ include/nbdkit-plugin.h | 2 ++ server/internal.h | 4 +++ filters/blocksize/blocksize.c | 31 ++++++++++++++++- plugins/data/data.c | 10 ++++++ plugins/file/file.c | 22 ++++++++++++ plugins/full/full.c | 12 ++++++- plugins/memory/memory.c | 10 ++++++ plugins/nbd/nbd.c | 22 ++++++++++++ plugins/null/null.c | 12 ++++++- plugins/pattern/pattern.c | 12 ++++++- plugins/random/random.c | 12 ++++++- plugins/sh/sh.c | 40 ++++++++++++++++++++++ plugins/split/split.c | 37 +++++++++++++++++++- plugins/streaming/streaming.c | 11 ++++++ plugins/zero/zero.c | 12 ++++++- server/filters.c | 60 ++++++++++++++++++++++++++++++++- server/plugins.c | 39 +++++++++++++++++++++ server/protocol-handshake.c | 10 +++++- server/protocol.c | 25 ++++++++++++++ tests/test-layers-filter.c | 22 +++++++++++- tests/test-layers-plugin.c | 17 ++++++++++ tests/test-layers.c | 36 ++++++++++++++++++++ tests/test-eflags.sh | 53 +++++++++++++++-------------- 30 files changed, 587 insertions(+), 38 deletions(-) -- 2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 1/9] server: Internal hooks for implementing NBD_CMD_CACHE
Until the next few patches expose and implement new callbacks for
filters and plugins, our initial implementation for NBD_CMD_CACHE is
to just blindly advertise the feature, and call into .pread with an
ignored buffer.
Note that for bisection reasons, this patch treats any use of a filter
as a forced .can_cache of false to bypass the filter's caching; once
all affected filters are patched to handle cache requests correctly, a
later patch will then switch the filter default to passthrough for the
sake of remaining filters.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
Note: we could also choose to always advertise caching, but make the
nbdkit default version be a no-op rather than call out to .pread. For
that matter, maybe a plugin's .can_cache should allow the plugin to
choose between a tri-state of no-op, fallback to .pread, or call
.cache; I'm not sure which is the saner default for a random plugin
compiled against older nbdkit. At least it's fairly easy to write a
filter that changes the default from no-op to .pread or from .pread to
no-op. Be thinking about that as you review the rest of the series.
---
docs/nbdkit-protocol.pod | 10 +++++++
common/protocol/protocol.h | 2 ++
server/internal.h | 4 +++
server/filters.c | 33 ++++++++++++++++++++++-
server/plugins.c | 34 ++++++++++++++++++++++++
server/protocol-handshake.c | 10 ++++++-
server/protocol.c | 25 +++++++++++++++++
tests/test-eflags.sh | 53 ++++++++++++++++++++-----------------
8 files changed, 144 insertions(+), 27 deletions(-)
diff --git a/docs/nbdkit-protocol.pod b/docs/nbdkit-protocol.pod
index f706cfd..6b85c14 100644
--- a/docs/nbdkit-protocol.pod
+++ b/docs/nbdkit-protocol.pod
@@ -152,6 +152,16 @@ when structured replies are in effect. However, the flag is
a no-op
until we extend the plugin API to allow a fragmented read in the first
place.
+=item C<NBD_CMD_CACHE>
+
+Supported in nbdkit E<ge> 1.13.4.
+
+This protocol extension allows a client to inform the server about
+intent to access a portion of the export, to allow the server an
+opportunity to cache things appropriately. For plugins that do not
+support a particular form of caching, nbdkit performs pread then
+ignores the results.
+
=item Resize Extension
I<Not supported>.
diff --git a/common/protocol/protocol.h b/common/protocol/protocol.h
index c27104c..e938643 100644
--- a/common/protocol/protocol.h
+++ b/common/protocol/protocol.h
@@ -95,6 +95,7 @@ extern const char *name_of_nbd_flag (int);
#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)
#define NBD_FLAG_SEND_DF (1 << 7)
#define NBD_FLAG_CAN_MULTI_CONN (1 << 8)
+#define NBD_FLAG_SEND_CACHE (1 << 10)
/* NBD options (new style handshake only). */
extern const char *name_of_nbd_opt (int);
@@ -217,6 +218,7 @@ extern const char *name_of_nbd_cmd (int);
#define NBD_CMD_DISC 2 /* Disconnect. */
#define NBD_CMD_FLUSH 3
#define NBD_CMD_TRIM 4
+#define NBD_CMD_CACHE 5
#define NBD_CMD_WRITE_ZEROES 6
#define NBD_CMD_BLOCK_STATUS 7
diff --git a/server/internal.h b/server/internal.h
index 67fccfc..6414a78 100644
--- a/server/internal.h
+++ b/server/internal.h
@@ -170,6 +170,7 @@ struct connection {
bool can_zero;
bool can_fua;
bool can_multi_conn;
+ bool can_cache;
bool can_extents;
bool using_tls;
bool structured_replies;
@@ -276,6 +277,7 @@ struct backend {
int (*can_extents) (struct backend *, struct connection *conn);
int (*can_fua) (struct backend *, struct connection *conn);
int (*can_multi_conn) (struct backend *, struct connection *conn);
+ int (*can_cache) (struct backend *, struct connection *conn);
int (*pread) (struct backend *, struct connection *conn, void *buf,
uint32_t count, uint64_t offset, uint32_t flags, int *err);
@@ -290,6 +292,8 @@ struct backend {
int (*extents) (struct backend *, struct connection *conn, uint32_t count,
uint64_t offset, uint32_t flags,
struct nbdkit_extents *extents, int *err);
+ int (*cache) (struct backend *, struct connection *conn, uint32_t count,
+ uint64_t offset, uint32_t flags, int *err);
};
/* plugins.c */
diff --git a/server/filters.c b/server/filters.c
index b73e74f..c619fd6 100644
--- a/server/filters.c
+++ b/server/filters.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2013-2018 Red Hat Inc.
+ * Copyright (C) 2013-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -573,6 +573,18 @@ filter_can_multi_conn (struct backend *b, struct connection
*conn)
return f->backend.next->can_multi_conn (f->backend.next, conn);
}
+static int
+filter_can_cache (struct backend *b, struct connection *conn)
+{
+ struct backend_filter *f = container_of (b, struct backend_filter, backend);
+
+ debug ("%s: can_cache", f->name);
+
+ /* FIXME: Default to f->backend.next->can_cache, once all filters
+ have been audited */
+ return 0;
+}
+
static int
filter_pread (struct backend *b, struct connection *conn,
void *buf, uint32_t count, uint64_t offset,
@@ -702,6 +714,23 @@ filter_extents (struct backend *b, struct connection *conn,
extents, err);
}
+static int
+filter_cache (struct backend *b, struct connection *conn,
+ uint32_t count, uint64_t offset,
+ uint32_t flags, int *err)
+{
+ struct backend_filter *f = container_of (b, struct backend_filter, backend);
+
+ assert (flags == 0);
+
+ debug ("%s: cache count=%" PRIu32 " offset=%" PRIu64
" flags=0x%" PRIx32,
+ f->name, count, offset, flags);
+
+ /* FIXME: Allow filter to rewrite request */
+ return f->backend.next->cache (f->backend.next, conn,
+ count, offset, flags, err);
+}
+
static struct backend filter_functions = {
.free = filter_free,
.thread_model = filter_thread_model,
@@ -726,12 +755,14 @@ static struct backend filter_functions = {
.can_extents = filter_can_extents,
.can_fua = filter_can_fua,
.can_multi_conn = filter_can_multi_conn,
+ .can_cache = filter_can_cache,
.pread = filter_pread,
.pwrite = filter_pwrite,
.flush = filter_flush,
.trim = filter_trim,
.zero = filter_zero,
.extents = filter_extents,
+ .cache = filter_cache,
};
/* Register and load a filter. */
diff --git a/server/plugins.c b/server/plugins.c
index e26d133..cabf543 100644
--- a/server/plugins.c
+++ b/server/plugins.c
@@ -448,6 +448,17 @@ plugin_can_multi_conn (struct backend *b, struct connection
*conn)
return 0; /* assume false */
}
+static int
+plugin_can_cache (struct backend *b, struct connection *conn)
+{
+ assert (connection_get_handle (conn, 0));
+
+ debug ("can_cache");
+
+ /* FIXME: return default based on plugin->cache */
+ return 0;
+}
+
/* Plugins and filters can call this to set the true errno, in cases
* where !errno_is_preserved.
*/
@@ -693,6 +704,27 @@ plugin_extents (struct backend *b, struct connection *conn,
return r;
}
+static int
+plugin_cache (struct backend *b, struct connection *conn,
+ uint32_t count, uint64_t offset, uint32_t flags,
+ int *err)
+{
+ struct backend_plugin *p = container_of (b, struct backend_plugin, backend);
+ int r = -1;
+
+ assert (connection_get_handle (conn, 0));
+ assert (!flags);
+
+ debug ("cache count=%" PRIu32 " offset=%" PRIu64, count,
offset);
+
+ /* FIXME: assert plugin->cache and call it */
+ assert (false);
+
+ if (r == -1)
+ *err = get_error (p);
+ return r;
+}
+
static struct backend plugin_functions = {
.free = plugin_free,
.thread_model = plugin_thread_model,
@@ -717,12 +749,14 @@ static struct backend plugin_functions = {
.can_extents = plugin_can_extents,
.can_fua = plugin_can_fua,
.can_multi_conn = plugin_can_multi_conn,
+ .can_cache = plugin_can_cache,
.pread = plugin_pread,
.pwrite = plugin_pwrite,
.flush = plugin_flush,
.trim = plugin_trim,
.zero = plugin_zero,
.extents = plugin_extents,
+ .cache = plugin_cache,
};
/* Register and load a plugin. */
diff --git a/server/protocol-handshake.c b/server/protocol-handshake.c
index 03377a9..af1cd18 100644
--- a/server/protocol-handshake.c
+++ b/server/protocol-handshake.c
@@ -109,7 +109,7 @@ protocol_compute_eflags (struct connection *conn, uint16_t
*flags)
conn->can_multi_conn = true;
}
- /* The result of this is not returned to callers here (or at any
+ /* The results of the next two are not returned to callers here (or at any
* time during the handshake). However it makes sense to do it once
* per connection and store the result in the handle anyway. This
* protocol_compute_eflags function is a bit misnamed XXX.
@@ -120,6 +120,14 @@ protocol_compute_eflags (struct connection *conn, uint16_t
*flags)
if (fl)
conn->can_extents = true;
+ fl = backend->can_cache (backend, conn);
+ if (fl == -1)
+ return -1;
+ if (fl)
+ conn->can_cache = true;
+ /* We emulate cache even when plugin doesn't support it */
+ eflags |= NBD_FLAG_SEND_CACHE;
+
if (conn->structured_replies)
eflags |= NBD_FLAG_SEND_DF;
diff --git a/server/protocol.c b/server/protocol.c
index 01d4c71..69227e5 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -76,6 +76,7 @@ validate_request (struct connection *conn,
/* Validate cmd, offset, count. */
switch (cmd) {
case NBD_CMD_READ:
+ case NBD_CMD_CACHE:
case NBD_CMD_WRITE:
case NBD_CMD_TRIM:
case NBD_CMD_WRITE_ZEROES:
@@ -254,6 +255,30 @@ handle_request (struct connection *conn,
return err;
break;
+ case NBD_CMD_CACHE:
+ /* The other backend methods don't check can_*. That is because
+ * those methods are implicitly suppressed by returning fewer
+ * eflags to the client. However the eflag for cache is always
+ * sent, because we emulate it here when the plugin lacks it.
+ */
+ if (conn->can_cache) {
+ if (backend->cache (backend, conn, count, offset, 0, &err) == -1)
+ return err;
+ }
+ else {
+ static char buf[MAX_REQUEST_SIZE]; /* data sink, never read */
+ uint32_t limit;
+
+ while (count) {
+ limit = MIN (count, sizeof buf);
+ if (backend->pread (backend, conn, buf, limit, offset, flags,
+ &err) == -1)
+ return err;
+ count -= limit;
+ }
+ }
+ break;
+
case NBD_CMD_WRITE_ZEROES:
if (!(flags & NBD_CMD_FLAG_NO_HOLE))
f |= NBDKIT_FLAG_MAY_TRIM;
diff --git a/tests/test-eflags.sh b/tests/test-eflags.sh
index 84dcfe8..0234aca 100755
--- a/tests/test-eflags.sh
+++ b/tests/test-eflags.sh
@@ -1,6 +1,6 @@
#!/usr/bin/env bash
# nbdkit
-# Copyright (C) 2018 Red Hat Inc.
+# Copyright (C) 2018-2019 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -94,8 +94,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# -r
@@ -108,8 +108,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# can_write=true
@@ -126,8 +126,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# -r
@@ -144,8 +144,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# can_write=false
@@ -164,8 +164,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# -r
@@ -185,8 +185,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# can_write=true
@@ -201,8 +201,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ]
||
+ fail "expected
HAS_FLAGS|SEND_TRIM|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# can_write=true
@@ -217,8 +217,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE ))
] ||
+ fail "expected
HAS_FLAGS|ROTATIONAL|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# -r
@@ -234,8 +234,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|ROTATIONAL|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# can_write=true
@@ -250,8 +250,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE )) ]
||
+ fail "expected
HAS_FLAGS|SEND_FUA|SEND_WRITE_ZEROES|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# -r
@@ -269,8 +269,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE )) ] ||
+ fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|SEND_CACHE"
#----------------------------------------------------------------------
# -r
@@ -284,5 +284,8 @@ case "$1" in
esac
EOF
-[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN )) ] ||
- fail "expected HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN"
+[ $eflags -eq $(( HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN|SEND_CACHE )) ] ||
+ fail "expected
HAS_FLAGS|READ_ONLY|SEND_DF|CAN_MULTI_CONN|SEND_CACHE"
+
+# NBD_FLAG_SEND_CACHE is always set even if can_cache returns false, because
+# nbdkit reckons it can emulate caching using pread.
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 2/9] plugins: Add .cache callback
Make it possible for plugins to provide an alternative behavior for
NBD_CMD_CACHE rather than just discarding the buffer of .pread.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
docs/nbdkit-plugin.pod | 52 +++++++++++++++++++++++++++++++++++++++++
include/nbdkit-plugin.h | 2 ++
server/plugins.c | 17 +++++++++-----
3 files changed, 65 insertions(+), 6 deletions(-)
diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
index 318be4c..0657151 100644
--- a/docs/nbdkit-plugin.pod
+++ b/docs/nbdkit-plugin.pod
@@ -620,6 +620,31 @@ with an error message and return C<-1>.
This callback is not required. If omitted, then we return false.
+=head2 C<.can_cache>
+
+ int can_cache (void *handle);
+
+This is called during the option negotiation phase to find out if the
+plugin supports a cache operation. The nature of the caching is
+unspecified (including whether there are limits on how much can be
+cached at once, and whether writes to a cached region have
+write-through or write-back semantics), but the command exists to let
+clients issue a hint to the server that they will be accessing that
+region of the export.
+
+If there is an error, C<.can_cache> should call C<nbdkit_error>
with
+an error message and return C<-1>. Since the NBD protocol has
+reserved the right to add future flags to mandate specific caching
+capabilities, this function should only return C<1> on success (in the
+future, other positive results may be treated as a bitmask of
+supported flags).
+
+This callback is not required. If omitted, then we return true iff a
+C<.cache> callback has been defined. Note that nbdkit always
+advertises caching capabilities to the client regardless of this
+callback; if this callback returns false, nbdkit handles a client
+cache request by calling C<.pread> and ignoring the resulting data.
+
=head2 C<.pread>
int pread (void *handle, void *buf, uint32_t count, uint64_t offset,
@@ -814,6 +839,33 @@ C<nbdkit_extent_add> returns C<0> on success or
C<-1> on failure. On
failure C<nbdkit_error> and/or C<nbdkit_set_error> has already been
called. C<errno> will be set to a suitable value.
+=head2 C<.cache>
+
+ int cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags);
+
+During the data serving phase, this callback is used to give the
+plugin a hint that the client intends to make further accesses to the
+given region of the export. The nature of caching is not specified
+further by the NBD specification (for example, a server may place
+limits on how much may be cached at once, and there is no way to
+control if writes to a cached area have write-through or write-back
+semantics). In fact, the cache command can always fail and still be
+compliant, and success might not guarantee a performance gain. If
+this callback is omitted, nbdkit defaults to providing cache semantics
+obtained by calling C<.pread> over the same region and ignoring the
+results.
+
+This function will not be called if C<.can_cache> returned false. The
+parameter C<flags> exists in case of future NBD protocol extensions;
+at this time, it will be 0 on input. A plugin must fail this function
+if C<flags> includes an unrecognized flag, as that may indicate a
+requirement that the plugin comply must with a specific caching
+semantic.
+
+If there is an error, C<.cache> should call C<nbdkit_error> with an
+error message, and C<nbdkit_set_error> to record an appropriate error
+(unless C<errno> is sufficient), then return C<-1>.
+
=head1 THREADS
Each nbdkit plugin must declare its thread safety model by defining
diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h
index 54b4ce2..e9b1808 100644
--- a/include/nbdkit-plugin.h
+++ b/include/nbdkit-plugin.h
@@ -128,6 +128,8 @@ struct nbdkit_plugin {
int (*can_extents) (void *handle);
int (*extents) (void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
struct nbdkit_extents *extents);
+ int (*can_cache) (void *handle);
+ int (*cache) (void *handle, uint32_t count, uint64_t offset, uint32_t flags);
};
extern void nbdkit_set_error (int err);
diff --git a/server/plugins.c b/server/plugins.c
index cabf543..947fe79 100644
--- a/server/plugins.c
+++ b/server/plugins.c
@@ -201,6 +201,8 @@ plugin_dump_fields (struct backend *b)
HAS (can_multi_conn);
HAS (can_extents);
HAS (extents);
+ HAS (can_cache);
+ HAS (cache);
#undef HAS
/* Custom fields. */
@@ -451,12 +453,16 @@ plugin_can_multi_conn (struct backend *b, struct
connection *conn)
static int
plugin_can_cache (struct backend *b, struct connection *conn)
{
+ struct backend_plugin *p = container_of (b, struct backend_plugin, backend);
+
assert (connection_get_handle (conn, 0));
debug ("can_cache");
- /* FIXME: return default based on plugin->cache */
- return 0;
+ if (p->plugin.can_cache)
+ return p->plugin.can_cache (connection_get_handle (conn, 0));
+ else
+ return p->plugin.cache != NULL;
}
/* Plugins and filters can call this to set the true errno, in cases
@@ -710,16 +716,15 @@ plugin_cache (struct backend *b, struct connection *conn,
int *err)
{
struct backend_plugin *p = container_of (b, struct backend_plugin, backend);
- int r = -1;
+ int r;
assert (connection_get_handle (conn, 0));
assert (!flags);
+ assert (p->plugin.cache);
debug ("cache count=%" PRIu32 " offset=%" PRIu64, count,
offset);
- /* FIXME: assert plugin->cache and call it */
- assert (false);
-
+ r = p->plugin.cache (connection_get_handle (conn, 0), count, offset,
flags);
if (r == -1)
*err = get_error (p);
return r;
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 3/9] file, split: Implement .cache with posix_fadvise
Since NBD_CMD_CACHE is already advisory, let's use an advisory kernel
interface to implement it ;) When posix_fadvise() is not present, it
is likely that nbdkit's fallback to .pread will actually have a
similar benefit in populating the filesystem cache, since we aren't
using O_DIRECT to avoid that cache.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
configure.ac | 3 ++-
plugins/file/file.c | 22 ++++++++++++++++++++++
plugins/split/split.c | 37 ++++++++++++++++++++++++++++++++++++-
3 files changed, 60 insertions(+), 2 deletions(-)
diff --git a/configure.ac b/configure.ac
index 58031f3..06124c5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -195,7 +195,8 @@ dnl Check for functions in libc, all optional.
AC_CHECK_FUNCS([\
fdatasync \
get_current_dir_name \
- mkostemp])
+ mkostemp \
+ posix_fadvise])
dnl Check whether printf("%m") works
AC_CACHE_CHECK([whether the printf family supports %m],
diff --git a/plugins/file/file.c b/plugins/file/file.c
index f0ac23b..85b033b 100644
--- a/plugins/file/file.c
+++ b/plugins/file/file.c
@@ -608,6 +608,25 @@ file_extents (void *handle, uint32_t count, uint64_t
offset,
}
#endif /* SEEK_HOLE */
+#if HAVE_POSIX_FADVISE
+/* Caching. */
+static int
+file_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ struct handle *h = handle;
+ int r;
+
+ /* Cache is advisory, we don't care if this fails */
+ r = posix_fadvise (h->fd, offset, count, POSIX_FADV_WILLNEED);
+ if (r) {
+ errno = r;
+ nbdkit_error ("posix_fadvise: %m");
+ return -1;
+ }
+ return 0;
+}
+#endif /* HAVE_POSIX_FADVISE */
+
static struct nbdkit_plugin plugin = {
.name = "file",
.longname = "nbdkit file plugin",
@@ -632,6 +651,9 @@ static struct nbdkit_plugin plugin = {
#ifdef SEEK_HOLE
.can_extents = file_can_extents,
.extents = file_extents,
+#endif
+#if HAVE_POSIX_FADVISE
+ .cache = file_cache,
#endif
.errno_is_preserved = 1,
};
diff --git a/plugins/split/split.c b/plugins/split/split.c
index cf2b2c7..cef2d51 100644
--- a/plugins/split/split.c
+++ b/plugins/split/split.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -277,6 +277,38 @@ split_pwrite (void *handle, const void *buf, uint32_t
count, uint64_t offset)
return 0;
}
+#if HAVE_POSIX_FADVISE
+/* Caching. */
+static int
+split_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ struct handle *h = handle;
+
+ /* Cache is advisory, we don't care if this fails */
+ while (count > 0) {
+ struct file *file = get_file (h, offset);
+ uint64_t foffs = offset - file->offset;
+ uint64_t max;
+ int r;
+
+ max = file->size - foffs;
+ if (max > count)
+ max = count;
+
+ r = posix_fadvise (file->fd, offset, max, POSIX_FADV_WILLNEED);
+ if (r) {
+ errno = r;
+ nbdkit_error ("posix_fadvise: %m");
+ return -1;
+ }
+ count -= r;
+ offset += r;
+ }
+
+ return 0;
+}
+#endif /* HAVE_POSIX_FADVISE */
+
static struct nbdkit_plugin plugin = {
.name = "split",
.version = PACKAGE_VERSION,
@@ -289,6 +321,9 @@ static struct nbdkit_plugin plugin = {
.get_size = split_get_size,
.pread = split_pread,
.pwrite = split_pwrite,
+#if HAVE_POSIX_FADVISE
+ .cache = split_cache,
+#endif
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 4/9] nbd: Implement NBD_CMD_CACHE passthrough
In the nbd plugin, if the remote server supports any form of caching,
we should utilize that rather than nbdkit's fallback to .pread.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
plugins/nbd/nbd.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 821f256..71549d7 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -1151,6 +1151,14 @@ nbd_can_multi_conn (void *handle)
return h->flags & NBD_FLAG_CAN_MULTI_CONN;
}
+static int
+nbd_can_cache (void *handle)
+{
+ struct handle *h = handle;
+
+ return h->flags & NBD_FLAG_SEND_CACHE;
+}
+
static int
nbd_can_extents (void *handle)
{
@@ -1245,6 +1253,18 @@ nbd_extents (void *handle, uint32_t count, uint64_t
offset,
return c < 0 ? c : nbd_reply (h, c);
}
+/* Cache a portion of the file. */
+static int
+nbd_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ struct handle *h = handle;
+ int c;
+
+ assert (!flags);
+ c = nbd_request (h, 0, NBD_CMD_CACHE, offset, count);
+ return c < 0 ? c : nbd_reply (h, c);
+}
+
static struct nbdkit_plugin plugin = {
.name = "nbd",
.longname = "nbdkit nbd plugin",
@@ -1264,12 +1284,14 @@ static struct nbdkit_plugin plugin = {
.can_fua = nbd_can_fua,
.can_multi_conn = nbd_can_multi_conn,
.can_extents = nbd_can_extents,
+ .can_cache = nbd_can_cache,
.pread = nbd_pread,
.pwrite = nbd_pwrite,
.zero = nbd_zero,
.flush = nbd_flush,
.trim = nbd_trim,
.extents = nbd_extents,
+ .cache = nbd_cache,
.errno_is_preserved = 1,
};
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 5/9] plugins: Implement no-op .cache for in-memory plugins
For our plugins which have no backing file but generate everything on
the fly or store things in memory, falling back to .pread on a cache
request is just wasted work. Implement a no-op .cache callback for the
drivers where there is no benefit to trying to cache anything.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
plugins/data/data.c | 10 ++++++++++
plugins/full/full.c | 12 +++++++++++-
plugins/memory/memory.c | 10 ++++++++++
plugins/null/null.c | 12 +++++++++++-
plugins/pattern/pattern.c | 12 +++++++++++-
plugins/random/random.c | 12 +++++++++++-
plugins/streaming/streaming.c | 11 +++++++++++
plugins/zero/zero.c | 12 +++++++++++-
8 files changed, 86 insertions(+), 5 deletions(-)
diff --git a/plugins/data/data.c b/plugins/data/data.c
index 55380c6..aaa3d2d 100644
--- a/plugins/data/data.c
+++ b/plugins/data/data.c
@@ -377,6 +377,15 @@ data_extents (void *handle, uint32_t count, uint64_t
offset,
return sparse_array_extents (sa, count, offset, extents);
}
+/* Cache. */
+static int
+data_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "data",
.version = PACKAGE_VERSION,
@@ -394,6 +403,7 @@ static struct nbdkit_plugin plugin = {
.zero = data_zero,
.trim = data_trim,
.extents = data_extents,
+ .cache = data_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/plugins/full/full.c b/plugins/full/full.c
index 7661856..51a9d67 100644
--- a/plugins/full/full.c
+++ b/plugins/full/full.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -137,6 +137,15 @@ full_extents (void *handle, uint32_t count, uint64_t
offset, uint32_t flags,
NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO);
}
+/* Cache. */
+static int
+full_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
/* Note that we don't need to handle flush: If there has been previous
* write then we have already returned an error. If there have been
* no previous writes then flush can be ignored.
@@ -156,6 +165,7 @@ static struct nbdkit_plugin plugin = {
.zero = full_zero,
.trim = full_trim,
.extents = full_extents,
+ .cache = full_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c
index 90fa99e..baa29e2 100644
--- a/plugins/memory/memory.c
+++ b/plugins/memory/memory.c
@@ -172,6 +172,15 @@ memory_extents (void *handle, uint32_t count, uint64_t
offset,
return sparse_array_extents (sa, count, offset, extents);
}
+/* Cache. */
+static int
+memory_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "memory",
.version = PACKAGE_VERSION,
@@ -189,6 +198,7 @@ static struct nbdkit_plugin plugin = {
.zero = memory_zero,
.trim = memory_trim,
.extents = memory_extents,
+ .cache = memory_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/plugins/null/null.c b/plugins/null/null.c
index 518b63b..5e40868 100644
--- a/plugins/null/null.c
+++ b/plugins/null/null.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -140,6 +140,15 @@ null_extents (void *handle, uint32_t count, uint64_t
offset, uint32_t flags,
NBDKIT_EXTENT_HOLE | NBDKIT_EXTENT_ZERO);
}
+/* Cache. */
+static int
+null_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "null",
.version = PACKAGE_VERSION,
@@ -155,6 +164,7 @@ static struct nbdkit_plugin plugin = {
.can_fua = null_can_fua,
.flush = null_flush,
.extents = null_extents,
+ .cache = null_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/plugins/pattern/pattern.c b/plugins/pattern/pattern.c
index 115bd96..e25da36 100644
--- a/plugins/pattern/pattern.c
+++ b/plugins/pattern/pattern.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -117,6 +117,15 @@ pattern_pread (void *handle, void *buf, uint32_t count,
uint64_t offset,
return 0;
}
+/* Cache. */
+static int
+pattern_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "pattern",
.version = PACKAGE_VERSION,
@@ -127,6 +136,7 @@ static struct nbdkit_plugin plugin = {
.get_size = pattern_get_size,
.can_multi_conn = pattern_can_multi_conn,
.pread = pattern_pread,
+ .cache = pattern_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/plugins/random/random.c b/plugins/random/random.c
index 7fb42c8..6219169 100644
--- a/plugins/random/random.c
+++ b/plugins/random/random.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -146,6 +146,15 @@ random_pread (void *handle, void *buf, uint32_t count,
uint64_t offset,
return 0;
}
+/* Cache. */
+static int
+random_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "random",
.version = PACKAGE_VERSION,
@@ -157,6 +166,7 @@ static struct nbdkit_plugin plugin = {
.get_size = random_get_size,
.can_multi_conn = random_can_multi_conn,
.pread = random_pread,
+ .cache = random_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/plugins/streaming/streaming.c b/plugins/streaming/streaming.c
index 4ca3e76..505ed91 100644
--- a/plugins/streaming/streaming.c
+++ b/plugins/streaming/streaming.c
@@ -246,6 +246,16 @@ streaming_pread (void *handle, void *buf, uint32_t count,
uint64_t offset)
return -1;
}
+/* Cache. */
+static int
+streaming_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* nbdkit's default of falling back to .pread is pointless: it will
+ * fail for addresses already written, and waste a memset() for
+ * addresses not yet reached. Treat this as a no-op instead. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "streaming",
.longname = "nbdkit streaming plugin",
@@ -259,6 +269,7 @@ static struct nbdkit_plugin plugin = {
.get_size = streaming_get_size,
.pwrite = streaming_pwrite,
.pread = streaming_pread,
+ .cache = streaming_cache,
.errno_is_preserved = 1,
};
diff --git a/plugins/zero/zero.c b/plugins/zero/zero.c
index 49ce08e..be8fc2a 100644
--- a/plugins/zero/zero.c
+++ b/plugins/zero/zero.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2017-2018 Red Hat Inc.
+ * Copyright (C) 2017-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -76,6 +76,15 @@ zero_pread (void *handle, void *buf, uint32_t count, uint64_t
offset,
return -1;
}
+/* Cache. */
+static int
+zero_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ /* Everything is already in memory, falling back to .pread is
+ actually slower than treating this as a no-op. */
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "zero",
.version = PACKAGE_VERSION,
@@ -83,6 +92,7 @@ static struct nbdkit_plugin plugin = {
.open = zero_open,
.get_size = zero_get_size,
.pread = zero_pread,
+ .cache = zero_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 6/9] sh: Implement .cache script callback
It's easy to expose new callbacks to sh plugins. I have no idea how
any plugin would actually usefully cache anything in shell (you can't
really store it in a shell variable, since a new shell process is
started for each command); but if nothing else, this allows a user to
implement .can_cache returning true to bypass nbdkit's normal fallback
to .pread.
The shell plugin, coupled with Rich's work on libnbd as a client-side
library for actually exercising calls to NBD_CMD_CACHE, will be a
useful way to prove that cache commands even make it through the
stack. (Remember, qemu 3.0 was released with a fatally flawed
NBD_CMD_CACHE server implementation, because there were no open source
clients at the time that could actually send the command to test the
server with).
Signed-off-by: Eric Blake <eblake@redhat.com>
---
plugins/sh/nbdkit-sh-plugin.pod | 12 +++++++++-
plugins/sh/sh.c | 40 +++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/plugins/sh/nbdkit-sh-plugin.pod b/plugins/sh/nbdkit-sh-plugin.pod
index 8af88b4..bfd58ab 100644
--- a/plugins/sh/nbdkit-sh-plugin.pod
+++ b/plugins/sh/nbdkit-sh-plugin.pod
@@ -218,9 +218,11 @@ This method is required.
=item C<can_extents>
+=item C<can_cache>
+
Unlike in other languages, you B<must> provide the C<can_*> methods
otherwise they are assumed to all return false and your C<pwrite>,
-C<flush>, C<trim>, C<zero> and C<extents> methods will
never be
+C<flush>, C<trim>, C<zero>, C<extents>, and
C<cache> methods will never be
called. The reason for this is obscure: In other languages we can
detect if (eg) a C<pwrite> method is defined and synthesize an
appropriate response if no actual C<can_write> method is defined.
@@ -232,6 +234,7 @@ possible with this plugin.
/path/to/script can_trim <handle>
/path/to/script can_zero <handle>
/path/to/script can_extents <handle>
+ /path/to/script can_cache <handle>
The script should exit with code C<0> for true or code C<3> for
false.
@@ -334,6 +337,13 @@ Unlike in other languages, if you provide an
C<extents> method you
B<must> also provide a C<can_extents> method which exits with code
C<0> (true).
+=item C<cache>
+
+ /path/to/script cache <handle> <count> <offset>
+
+Unlike in other languages, if you provide a C<cache> method you
B<must>
+also provide a C<can_cache> method which exits with code C<0>
(true).
+
=back
=head2 Missing callbacks
diff --git a/plugins/sh/sh.c b/plugins/sh/sh.c
index a5beb57..7aded44 100644
--- a/plugins/sh/sh.c
+++ b/plugins/sh/sh.c
@@ -578,6 +578,12 @@ sh_can_multi_conn (void *handle)
return boolean_method (handle, "can_multi_conn");
}
+static int
+sh_can_cache (void *handle)
+{
+ return boolean_method (handle, "can_cache");
+}
+
static int
sh_flush (void *handle, uint32_t flags)
{
@@ -782,6 +788,38 @@ sh_extents (void *handle, uint32_t count, uint64_t offset,
uint32_t flags,
}
}
+static int
+sh_cache (void *handle, uint32_t count, uint64_t offset, uint32_t flags)
+{
+ char *h = handle;
+ char cbuf[32], obuf[32];
+ const char *args[] = { script, "cache", h, cbuf, obuf, NULL };
+
+ snprintf (cbuf, sizeof cbuf, "%" PRIu32, count);
+ snprintf (obuf, sizeof obuf, "%" PRIu64, offset);
+ assert (!flags);
+
+ switch (call (args)) {
+ case OK:
+ return 0;
+
+ case MISSING:
+ /* Ignore lack of cache callback. */
+ return 0;
+
+ case ERROR:
+ return -1;
+
+ case RET_FALSE:
+ nbdkit_error ("%s: %s method returned unexpected code (3/false)",
+ script, "cache");
+ errno = EIO;
+ return -1;
+
+ default: abort ();
+ }
+}
+
#define sh_config_help \
"script=<FILENAME> (required) The shell script to run.\n"
\
"[other arguments may be used by the plugin that you load]"
@@ -812,6 +850,7 @@ static struct nbdkit_plugin plugin = {
.can_extents = sh_can_extents,
.can_fua = sh_can_fua,
.can_multi_conn = sh_can_multi_conn,
+ .can_cache = sh_can_cache,
.pread = sh_pread,
.pwrite = sh_pwrite,
@@ -819,6 +858,7 @@ static struct nbdkit_plugin plugin = {
.trim = sh_trim,
.zero = sh_zero,
.extents = sh_extents,
+ .cache = sh_cache,
.errno_is_preserved = 1,
};
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 7/9] filters: Add .cache callback
Make it possible for filters to adjust the behavior for
NBD_CMD_CACHE. To avoid any 'git bisect' breakage, this patch defaults
.can_cache to false until all necessary filters have been patched
first.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
docs/nbdkit-filter.pod | 27 ++++++++++++++++++++++++++-
include/nbdkit-filter.h | 8 ++++++++
server/filters.c | 39 +++++++++++++++++++++++++++++++++------
3 files changed, 67 insertions(+), 7 deletions(-)
diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod
index 6aeaa7b..787333a 100644
--- a/docs/nbdkit-filter.pod
+++ b/docs/nbdkit-filter.pod
@@ -356,6 +356,8 @@ calls.
=head2 C<.can_multi_conn>
+=head2 C<.can_cache>
+
int (*can_write) (struct nbdkit_next_ops *next_ops, void *nxdata,
void *handle);
int (*can_flush) (struct nbdkit_next_ops *next_ops, void *nxdata,
@@ -373,6 +375,8 @@ calls.
void *handle);
int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata,
void *handle);
+ int (*can_cache) (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle);
These intercept the corresponding plugin methods, and control feature
bits advertised to the client.
@@ -597,6 +601,27 @@ Returns the number of extents in the list.
Returns a copy of the C<i>'th extent.
+=head2 C<.cache>
+
+ int (*cache) (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offset,
+ uint32_t flags, int *err);
+
+This intercepts the plugin C<.cache> method and can be used to modify
+cache requests.
+
+This function will not be called if C<.can_cache> returned false; in
+turn, the filter should not call C<next_ops-E<gt>cache> if
+C<next_ops-E<gt>can_cache> did not return true.
+
+The parameter C<flags> exists in case of future NBD protocol
+extensions; at this time, it will be 0 on input, and the filter should
+not pass any flags to C<next_ops-E<gt>cache>.
+
+If there is an error, C<.cache> should call C<nbdkit_error> with an
+error message B<and> return -1 with C<err> set to the positive
errno
+value to return to the client.
+
=head1 ERROR HANDLING
If there is an error in the filter itself, the filter should call
@@ -708,4 +733,4 @@ Richard W.M. Jones
=head1 COPYRIGHT
-Copyright (C) 2013-2018 Red Hat Inc.
+Copyright (C) 2013-2019 Red Hat Inc.
diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h
index 9b6cd6e..5893dd8 100644
--- a/include/nbdkit-filter.h
+++ b/include/nbdkit-filter.h
@@ -74,6 +74,7 @@ struct nbdkit_next_ops {
int (*can_extents) (void *nxdata);
int (*can_fua) (void *nxdata);
int (*can_multi_conn) (void *nxdata);
+ int (*can_cache) (void *nxdata);
int (*pread) (void *nxdata, void *buf, uint32_t count, uint64_t offset,
uint32_t flags, int *err);
@@ -87,6 +88,8 @@ struct nbdkit_next_ops {
int *err);
int (*extents) (void *nxdata, uint32_t count, uint64_t offset, uint32_t
flags,
struct nbdkit_extents *extents, int *err);
+ int (*cache) (void *nxdata, uint32_t count, uint64_t offset, uint32_t flags,
+ int *err);
};
struct nbdkit_filter {
@@ -142,6 +145,8 @@ struct nbdkit_filter {
void *handle);
int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata,
void *handle);
+ int (*can_cache) (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle);
int (*pread) (struct nbdkit_next_ops *next_ops, void *nxdata,
void *handle, void *buf, uint32_t count, uint64_t offset,
@@ -161,6 +166,9 @@ struct nbdkit_filter {
int (*extents) (struct nbdkit_next_ops *next_ops, void *nxdata,
void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
struct nbdkit_extents *extents, int *err);
+ int (*cache) (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offset, uint32_t flags,
+ int *err);
};
#define NBDKIT_REGISTER_FILTER(filter) \
diff --git a/server/filters.c b/server/filters.c
index c619fd6..a48f67e 100644
--- a/server/filters.c
+++ b/server/filters.c
@@ -329,6 +329,13 @@ next_can_multi_conn (void *nxdata)
return b_conn->b->can_multi_conn (b_conn->b, b_conn->conn);
}
+static int
+next_can_cache (void *nxdata)
+{
+ struct b_conn *b_conn = nxdata;
+ return b_conn->b->can_cache (b_conn->b, b_conn->conn);
+}
+
static int
next_pread (void *nxdata, void *buf, uint32_t count, uint64_t offset,
uint32_t flags, int *err)
@@ -379,6 +386,15 @@ next_extents (void *nxdata, uint32_t count, uint64_t
offset, uint32_t flags,
extents, err);
}
+static int
+next_cache (void *nxdata, uint32_t count, uint64_t offset,
+ uint32_t flags, int *err)
+{
+ struct b_conn *b_conn = nxdata;
+ return b_conn->b->cache (b_conn->b, b_conn->conn, count, offset,
flags,
+ err);
+}
+
static struct nbdkit_next_ops next_ops = {
.get_size = next_get_size,
.can_write = next_can_write,
@@ -389,12 +405,14 @@ static struct nbdkit_next_ops next_ops = {
.can_extents = next_can_extents,
.can_fua = next_can_fua,
.can_multi_conn = next_can_multi_conn,
+ .can_cache = next_can_cache,
.pread = next_pread,
.pwrite = next_pwrite,
.flush = next_flush,
.trim = next_trim,
.zero = next_zero,
.extents = next_extents,
+ .cache = next_cache,
};
static int
@@ -577,12 +595,16 @@ static int
filter_can_cache (struct backend *b, struct connection *conn)
{
struct backend_filter *f = container_of (b, struct backend_filter, backend);
+ void *handle = connection_get_handle (conn, f->backend.i);
+ struct b_conn nxdata = { .b = f->backend.next, .conn = conn };
debug ("%s: can_cache", f->name);
- /* FIXME: Default to f->backend.next->can_cache, once all filters
- have been audited */
- return 0;
+ if (f->filter.can_cache)
+ return f->filter.can_cache (&next_ops, &nxdata, handle);
+ else
+ return 0; /* FIXME - allow passthrough once all filters are audited */
+ return f->backend.next->can_cache (f->backend.next, conn);
}
static int
@@ -720,15 +742,20 @@ filter_cache (struct backend *b, struct connection *conn,
uint32_t flags, int *err)
{
struct backend_filter *f = container_of (b, struct backend_filter, backend);
+ void *handle = connection_get_handle (conn, f->backend.i);
+ struct b_conn nxdata = { .b = f->backend.next, .conn = conn };
assert (flags == 0);
debug ("%s: cache count=%" PRIu32 " offset=%" PRIu64
" flags=0x%" PRIx32,
f->name, count, offset, flags);
- /* FIXME: Allow filter to rewrite request */
- return f->backend.next->cache (f->backend.next, conn,
- count, offset, flags, err);
+ if (f->filter.cache)
+ return f->filter.cache (&next_ops, &nxdata, handle,
+ count, offset, flags, err);
+ else
+ return f->backend.next->cache (f->backend.next, conn,
+ count, offset, flags, err);
}
static struct backend filter_functions = {
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 8/9] test-layers: Test .cache usage
This also forms the first instance that I am aware of where an open
source client actually sends NBD_CMD_CACHE to the server - even if the
client is just a test app (the nbd plugin doesn't count - it won't
send to the server unless a client sends to the plugin first).
Signed-off-by: Eric Blake <eblake@redhat.com>
---
tests/test-layers-filter.c | 22 +++++++++++++++++++++-
tests/test-layers-plugin.c | 17 +++++++++++++++++
tests/test-layers.c | 36 ++++++++++++++++++++++++++++++++++++
3 files changed, 74 insertions(+), 1 deletion(-)
diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c
index a8f4723..bd063bd 100644
--- a/tests/test-layers-filter.c
+++ b/tests/test-layers-filter.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2018 Red Hat Inc.
+ * Copyright (C) 2018-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -184,6 +184,15 @@ test_layers_filter_can_extents (struct nbdkit_next_ops
*next_ops,
return next_ops->can_extents (nxdata);
}
+static int
+test_layers_filter_can_cache (struct nbdkit_next_ops *next_ops,
+ void *nxdata,
+ void *handle)
+{
+ DEBUG_FUNCTION;
+ return next_ops->can_cache (nxdata);
+}
+
static int
test_layers_filter_pread (struct nbdkit_next_ops *next_ops, void *nxdata,
void *handle, void *buf,
@@ -241,6 +250,15 @@ test_layers_filter_extents (struct nbdkit_next_ops
*next_ops, void *nxdata,
return next_ops->extents (nxdata, count, offset, flags, extents, err);
}
+static int
+test_layers_filter_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offset,
+ uint32_t flags, int *err)
+{
+ DEBUG_FUNCTION;
+ return next_ops->cache (nxdata, count, offset, flags, err);
+}
+
static struct nbdkit_filter filter = {
.name = "testlayers" layer,
.version = PACKAGE_VERSION,
@@ -262,12 +280,14 @@ static struct nbdkit_filter filter = {
.can_fua = test_layers_filter_can_fua,
.can_multi_conn = test_layers_filter_can_multi_conn,
.can_extents = test_layers_filter_can_extents,
+ .can_cache = test_layers_filter_can_cache,
.pread = test_layers_filter_pread,
.pwrite = test_layers_filter_pwrite,
.flush = test_layers_filter_flush,
.trim = test_layers_filter_trim,
.zero = test_layers_filter_zero,
.extents = test_layers_filter_extents,
+ .cache = test_layers_filter_cache,
};
NBDKIT_REGISTER_FILTER(filter)
diff --git a/tests/test-layers-plugin.c b/tests/test-layers-plugin.c
index f9b2014..981641e 100644
--- a/tests/test-layers-plugin.c
+++ b/tests/test-layers-plugin.c
@@ -143,6 +143,13 @@ test_layers_plugin_can_multi_conn (void *handle)
return 1;
}
+static int
+test_layers_plugin_can_cache (void *handle)
+{
+ DEBUG_FUNCTION;
+ return 1;
+}
+
static int
test_layers_plugin_can_extents (void *handle)
{
@@ -201,6 +208,14 @@ test_layers_plugin_extents (void *handle,
return nbdkit_add_extent (extents, offset, count, 0);
}
+static int
+test_layers_plugin_cache (void *handle,
+ uint32_t count, uint64_t offset, uint32_t flags)
+{
+ DEBUG_FUNCTION;
+ return 0;
+}
+
static struct nbdkit_plugin plugin = {
.name = "testlayersplugin",
.version = PACKAGE_VERSION,
@@ -220,12 +235,14 @@ static struct nbdkit_plugin plugin = {
.can_fua = test_layers_plugin_can_fua,
.can_multi_conn = test_layers_plugin_can_multi_conn,
.can_extents = test_layers_plugin_can_extents,
+ .can_cache = test_layers_plugin_can_cache,
.pread = test_layers_plugin_pread,
.pwrite = test_layers_plugin_pwrite,
.flush = test_layers_plugin_flush,
.trim = test_layers_plugin_trim,
.zero = test_layers_plugin_zero,
.extents = test_layers_plugin_extents,
+ .cache = test_layers_plugin_cache,
/* In this plugin, errno is preserved properly along error return
* paths from failed system calls.
*/
diff --git a/tests/test-layers.c b/tests/test-layers.c
index 627e4ec..a820ba5 100644
--- a/tests/test-layers.c
+++ b/tests/test-layers.c
@@ -361,6 +361,12 @@ main (int argc, char *argv[])
"filter1: test_layers_filter_can_extents",
"test_layers_plugin_can_extents",
NULL);
+ log_verify_seen_in_order
+ ("filter3: test_layers_filter_can_cache",
+ "filter2: test_layers_filter_can_cache",
+ "filter1: test_layers_filter_can_cache",
+ "test_layers_plugin_can_cache",
+ NULL);
fprintf (stderr, "%s: protocol connected\n", program_name);
@@ -526,6 +532,36 @@ main (int argc, char *argv[])
"test_layers_plugin_zero",
NULL);
+ request.type = htobe16 (NBD_CMD_CACHE);
+ request.offset = htobe64 (0);
+ request.count = htobe32 (512);
+ request.flags = htobe16 (0);
+ if (send (sock, &request, sizeof request, 0) != sizeof request) {
+ perror ("send: NBD_CMD_CACHE");
+ exit (EXIT_FAILURE);
+ }
+ if (recv (sock, &reply, sizeof reply, MSG_WAITALL) != sizeof reply) {
+ perror ("recv: NBD_CMD_CACHE");
+ exit (EXIT_FAILURE);
+ }
+ if (reply.error != NBD_SUCCESS) {
+ fprintf (stderr, "%s: NBD_CMD_CACHE failed with %d\n",
+ program_name, reply.error);
+ exit (EXIT_FAILURE);
+ }
+
+ sleep (1);
+ log_verify_seen_in_order
+ ("testlayersfilter3: cache count=512 offset=0 flags=0x0",
+ "filter3: test_layers_filter_cache",
+ "testlayersfilter2: cache count=512 offset=0 flags=0x0",
+ "filter2: test_layers_filter_cache",
+ "testlayersfilter1: cache count=512 offset=0 flags=0x0",
+ "filter1: test_layers_filter_cache",
+ "testlayersplugin: debug: cache count=512 offset=0",
+ "test_layers_plugin_cache",
+ NULL);
+
/* XXX We should test NBD_CMD_BLOCK_STATUS here. However it
* requires that we negotiate structured replies and base:allocation
* in the handshake, and the format of the reply is more complex
--
2.20.1
Eric Blake
2019-May-10 03:03 UTC
[Libguestfs] [nbdkit PATCH 9/9] blocksize: Implement .cache rounding
Rely on .can_cache passthrough to imply that our .cache won't be
called unless the plugin also has .cache. [Technically, that won't
happen until a later patch flips the default in filters.c]. Round the
cache request out, to cache the same range as would otherwise be
passed to the plugin's .pread if we had set .can_cache to
false. Fortunately, when maxlen is larger than maxblock, and the
plugin supports .cache, this results in a lot fewer calls into the
plugin than the .pread fallback.
Oddly enough, a client can submit an unaligned request for just under
4G of caching where our rounding would overflow a 32-bit integer, so
our rounding has to use a 64-bit temporary.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
filters/blocksize/blocksize.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c
index 4f3e9a3..2d28222 100644
--- a/filters/blocksize/blocksize.c
+++ b/filters/blocksize/blocksize.c
@@ -1,5 +1,5 @@
/* nbdkit
- * Copyright (C) 2018 Red Hat Inc.
+ * Copyright (C) 2018-2019 Red Hat Inc.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
@@ -389,6 +389,34 @@ blocksize_extents (struct nbdkit_next_ops *next_ops, void
*nxdata,
flags, extents, err);
}
+static int
+blocksize_cache (struct nbdkit_next_ops *next_ops, void *nxdata,
+ void *handle, uint32_t count, uint64_t offs, uint32_t flags,
+ int *err)
+{
+ uint32_t limit;
+ uint64_t remaining = count; /* Rounding out could exceed 32 bits */
+
+ /* Unaligned head */
+ limit = offs & (minblock - 1);
+ remaining += limit;
+ offs -= limit;
+
+ /* Unaligned tail */
+ remaining = ROUND_UP (remaining, minblock);
+
+ /* Aligned body */
+ while (remaining) {
+ limit = MIN (maxdata, remaining);
+ if (next_ops->cache (nxdata, limit, offs, flags, err) == -1)
+ return -1;
+ offs += limit;
+ remaining -= limit;
+ }
+
+ return 0;
+}
+
static struct nbdkit_filter filter = {
.name = "blocksize",
.longname = "nbdkit blocksize filter",
@@ -404,6 +432,7 @@ static struct nbdkit_filter filter = {
.trim = blocksize_trim,
.zero = blocksize_zero,
.extents = blocksize_extents,
+ .cache = blocksize_cache,
};
NBDKIT_REGISTER_FILTER(filter)
--
2.20.1
Richard W.M. Jones
2019-May-13 07:56 UTC
Re: [Libguestfs] [nbdkit PATCH 1/9] server: Internal hooks for implementing NBD_CMD_CACHE
On Thu, May 09, 2019 at 10:03:34PM -0500, Eric Blake wrote:> Until the next few patches expose and implement new callbacks for > filters and plugins, our initial implementation for NBD_CMD_CACHE is > to just blindly advertise the feature, and call into .pread with an > ignored buffer. > > Note that for bisection reasons, this patch treats any use of a filter > as a forced .can_cache of false to bypass the filter's caching; once > all affected filters are patched to handle cache requests correctly, a > later patch will then switch the filter default to passthrough for the > sake of remaining filters. > > Signed-off-by: Eric Blake <eblake@redhat.com> > > --- > > Note: we could also choose to always advertise caching, but make the > nbdkit default version be a no-op rather than call out to .pread. For > that matter, maybe a plugin's .can_cache should allow the plugin to > choose between a tri-state of no-op, fallback to .pread, or call > .cache; I'm not sure which is the saner default for a random plugin > compiled against older nbdkit. At least it's fairly easy to write a > filter that changes the default from no-op to .pread or from .pread to > no-op. Be thinking about that as you review the rest of the series.I'm worried that we end up in the same situation that we do with .zero: because we emulate it, we end up with a slow zero support, whereas clients [well, Nir S. mainly] actually want NBD_FLAG_SEND_WRITE_ZEROES == *fast* zeroing is available. In particular in the NBD_CMD_CACHE case: for some plugins emulating this behind the scenes with pread makes some sense. eg. It works for nbdkit-file-plugin because it will prefetch the data into Linux's page cache. However if a plugin makes a network access, eg. nbdkit-vddk-plugin, that can be both very expensive and has no prefetching benefit. Therefore I do think we should default to ignoring NBD_CMD_CACHE commands and *not* advertising the feature, except when the plugin says that it is able to handle it. For tri-stating .can_cache: yes that makes sense, because there is moderate difficulty for a plugin to emulate .cache using .pread. The plugin has to at least allocate a dummy buffer and throw it away, and we could probably do something more efficient than that if we implement an emulate mode in the server. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2019-May-13 08:01 UTC
Re: [Libguestfs] [nbdkit PATCH 6/9] sh: Implement .cache script callback
For this plugin, if a tri-state can_cache is implemented, the code from sh_can_fua could be reused: https://github.com/libguestfs/nbdkit/blob/7a4f38f98920a1192e5c50ea9bd212e759e3f4e7/plugins/sh/sh.c#L530 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Maybe Matching Threads
- [nbdkit PATCH v2 07/24] sh: Implement .cache script callback
- [nbdkit PATCH 1/2] sh, eval: Cache .can_zero and .can_flush
- [nbdkit PATCH v3 06/14] api: Add .export_description
- [nbdkit PATCH 3/3] plugins: Add .can_fast_zero hook
- [PATCH nbdkit 2/3] sh: Switch nbdkit-sh-plugin to use API version 2.