Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 01/11] server: Implement NBD_FLAG_CAN_MULTI_CONN.
For existing commits, this is almost identical to v1, except that I updated some commit messages and reordered the commits in a somewhat more logical sequence. The main changes are the extra commits: [06/11] plugins: Return NBD_FLAG_CAN_MULTI_CONN from some readonly plugins. - Readonly plugins that can set the flag unconditionally. [09/11] partitioning: Return NBD_FLAG_CAN_MULTI_CONN. [10/11] data, memory: Return NBD_FLAG_CAN_MULTI_CONN. - Writeable plugins where it should be safe to set the flag always. [11/11] memory, data: Use fine-grained locking and - Let's play with locking. That last commit should be "data, memory: ..." Rich.
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 01/11] server: Implement NBD_FLAG_CAN_MULTI_CONN.
>From the protocol document:"NBD_FLAG_CAN_MULTI_CONN: Indicates that the server operates entirely without cache, or that the cache it uses is shared among all connections to the given device. In particular, if this flag is present, then the effects of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA MUST be visible across all connections when the server sends its reply to that command to the client. In the absence of this flag, clients SHOULD NOT multiplex their commands over more than one connection to the export." This is implemented in nbdkit by a boolean function which plugins may implement if they believe they satisfy the requirements above. It can also be passed through or modified by filters. It defaults to false, which is the safe default. --- docs/nbdkit-filter.pod | 4 ++++ docs/nbdkit-plugin.pod | 20 ++++++++++++++++++++ docs/nbdkit-protocol.pod | 4 ++++ include/nbdkit-filter.h | 3 +++ include/nbdkit-plugin.h | 2 ++ server/internal.h | 1 + server/protocol.h | 1 + server/connections.c | 9 +++++++++ server/filters.c | 24 ++++++++++++++++++++++++ server/plugins.c | 17 +++++++++++++++++ tests/test-layers-filter.c | 10 ++++++++++ tests/test-layers-plugin.c | 8 ++++++++ tests/test-layers.c | 6 ++++++ 13 files changed, 109 insertions(+) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 1e1bf5d..4f27adb 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -354,6 +354,8 @@ calls. =head2 C<.can_fua> +=head2 C<.can_multi_conn> + int (*can_write) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_flush) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -367,6 +369,8 @@ calls. void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); These intercept the corresponding plugin methods, and control feature bits advertised to the client. diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index cfa14b0..c72ea0d 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -586,6 +586,26 @@ handle FUA requests. If omitted, nbdkit checks whether C<.flush> exists, and behaves as if this function returns C<NBDKIT_FUA_NONE> or C<NBDKIT_FUA_EMULATE> as appropriate. +=head2 C<.can_multi_conn> + + int can_multi_conn (void *handle); + +This is called during the option negotiation phase to find out if the +plugin can handle multiple connections from a single client. + +Specifically it means that either the plugin does not cache requests +at all. Or if it does cache then the effects of a C<.flush> request +or setting C<NBDKIT_FLAG_FUA> on a request must be visible across all +connections to the plugin before the plugin replies to that request. + +If you use the Linux NBD client with the I<-C> option then this flag +is checked. + +If there is an error, C<.can_multi_conn> should call C<nbdkit_error> +with an error message and return C<-1>. + +This callback is not required. If omitted, then we return false. + =head2 C<.pread> int pread (void *handle, void *buf, uint32_t count, uint64_t offset, diff --git a/docs/nbdkit-protocol.pod b/docs/nbdkit-protocol.pod index d5ba00f..db7fcf9 100644 --- a/docs/nbdkit-protocol.pod +++ b/docs/nbdkit-protocol.pod @@ -116,6 +116,10 @@ Supported in nbdkit E<ge> 1.9.3. This protocol enhancement allows a client to inspect details about the export without actually connecting. +=item C<NBD_FLAG_CAN_MULTI_CONN> + +Supported in nbdkit E<ge> 1.9.10. + =item Structured Replies I<Not supported>. diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 931d923..71c06c8 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -59,6 +59,7 @@ struct nbdkit_next_ops { int (*can_trim) (void *nxdata); int (*can_zero) (void *nxdata); int (*can_fua) (void *nxdata); + int (*can_multi_conn) (void *nxdata); int (*pread) (void *nxdata, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err); @@ -121,6 +122,8 @@ struct nbdkit_filter { void *handle); int (*can_fua) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*can_multi_conn) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*pread) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, uint32_t count, uint64_t offset, diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index 31aa6c7..d43b2f5 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -123,6 +123,8 @@ struct nbdkit_plugin { #endif const char *magic_config_key; + + int (*can_multi_conn) (void *handle); }; extern void nbdkit_set_error (int err); diff --git a/server/internal.h b/server/internal.h index e5c91a9..1f237a5 100644 --- a/server/internal.h +++ b/server/internal.h @@ -212,6 +212,7 @@ struct backend { int (*can_trim) (struct backend *, struct connection *conn); int (*can_zero) (struct backend *, struct connection *conn); int (*can_fua) (struct backend *, struct connection *conn); + int (*can_multi_conn) (struct backend *, struct connection *conn); int (*pread) (struct backend *, struct connection *conn, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err); diff --git a/server/protocol.h b/server/protocol.h index 6709ddc..003fc45 100644 --- a/server/protocol.h +++ b/server/protocol.h @@ -94,6 +94,7 @@ extern const char *name_of_nbd_flag (int); #define NBD_FLAG_ROTATIONAL (1 << 4) #define NBD_FLAG_SEND_TRIM (1 << 5) #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) +#define NBD_FLAG_CAN_MULTI_CONN (1 << 8) /* NBD options (new style handshake only). */ extern const char *name_of_nbd_opt (int); diff --git a/server/connections.c b/server/connections.c index 0a89315..51d4912 100644 --- a/server/connections.c +++ b/server/connections.c @@ -84,6 +84,7 @@ struct connection { bool can_trim; bool can_zero; bool can_fua; + bool can_multi_conn; bool using_tls; int sockin, sockout; @@ -463,6 +464,14 @@ compute_eflags (struct connection *conn, uint16_t *flags) conn->is_rotational = true; } + fl = backend->can_multi_conn (backend, conn); + if (fl == -1) + return -1; + if (fl) { + eflags |= NBD_FLAG_CAN_MULTI_CONN; + conn->can_multi_conn = true; + } + *flags = eflags; return 0; } diff --git a/server/filters.c b/server/filters.c index d02e7fb..5b7abc4 100644 --- a/server/filters.c +++ b/server/filters.c @@ -316,6 +316,13 @@ next_can_fua (void *nxdata) return b_conn->b->can_fua (b_conn->b, b_conn->conn); } +static int +next_can_multi_conn (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return b_conn->b->can_multi_conn (b_conn->b, b_conn->conn); +} + static int next_pread (void *nxdata, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err) @@ -365,6 +372,7 @@ static struct nbdkit_next_ops next_ops = { .can_trim = next_can_trim, .can_zero = next_can_zero, .can_fua = next_can_fua, + .can_multi_conn = next_can_multi_conn, .pread = next_pread, .pwrite = next_pwrite, .flush = next_flush, @@ -518,6 +526,21 @@ filter_can_fua (struct backend *b, struct connection *conn) return f->backend.next->can_fua (f->backend.next, conn); } +static int +filter_can_multi_conn (struct backend *b, struct connection *conn) +{ + struct backend_filter *f = container_of (b, struct backend_filter, backend); + void *handle = connection_get_handle (conn, f->backend.i); + struct b_conn nxdata = { .b = f->backend.next, .conn = conn }; + + debug ("%s: can_multi_conn", f->name); + + if (f->filter.can_multi_conn) + return f->filter.can_multi_conn (&next_ops, &nxdata, handle); + else + return f->backend.next->can_multi_conn (f->backend.next, conn); +} + static int filter_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, @@ -645,6 +668,7 @@ static struct backend filter_functions = { .can_trim = filter_can_trim, .can_zero = filter_can_zero, .can_fua = filter_can_fua, + .can_multi_conn = filter_can_multi_conn, .pread = filter_pread, .pwrite = filter_pwrite, .flush = filter_flush, diff --git a/server/plugins.c b/server/plugins.c index cb0f411..bbde578 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -197,6 +197,7 @@ plugin_dump_fields (struct backend *b) HAS (flush); HAS (trim); HAS (zero); + HAS (can_multi_conn); #undef HAS /* Custom fields. */ @@ -414,6 +415,21 @@ plugin_can_fua (struct backend *b, struct connection *conn) return NBDKIT_FUA_NONE; } +static int +plugin_can_multi_conn (struct backend *b, struct connection *conn) +{ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + assert (connection_get_handle (conn, 0)); + + debug ("can_multi_conn"); + + if (p->plugin.can_multi_conn) + return p->plugin.can_multi_conn (connection_get_handle (conn, 0)); + else + return 0; /* assume false */ +} + /* Plugins and filters can call this to set the true errno, in cases * where !errno_is_preserved. */ @@ -656,6 +672,7 @@ static struct backend plugin_functions = { .can_trim = plugin_can_trim, .can_zero = plugin_can_zero, .can_fua = plugin_can_fua, + .can_multi_conn = plugin_can_multi_conn, .pread = plugin_pread, .pwrite = plugin_pwrite, .flush = plugin_flush, diff --git a/tests/test-layers-filter.c b/tests/test-layers-filter.c index c5a96e7..6da6ee6 100644 --- a/tests/test-layers-filter.c +++ b/tests/test-layers-filter.c @@ -167,6 +167,15 @@ test_layers_filter_can_fua (struct nbdkit_next_ops *next_ops, void *nxdata, return next_ops->can_fua (nxdata); } +static int +test_layers_filter_can_multi_conn (struct nbdkit_next_ops *next_ops, + void *nxdata, + void *handle) +{ + DEBUG_FUNCTION; + return next_ops->can_multi_conn (nxdata); +} + static int test_layers_filter_pread (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, @@ -233,6 +242,7 @@ static struct nbdkit_filter filter = { .can_trim = test_layers_filter_can_trim, .can_zero = test_layers_filter_can_zero, .can_fua = test_layers_filter_can_fua, + .can_multi_conn = test_layers_filter_can_multi_conn, .pread = test_layers_filter_pread, .pwrite = test_layers_filter_pwrite, .flush = test_layers_filter_flush, diff --git a/tests/test-layers-plugin.c b/tests/test-layers-plugin.c index 3befcc1..042ecc3 100644 --- a/tests/test-layers-plugin.c +++ b/tests/test-layers-plugin.c @@ -137,6 +137,13 @@ test_layers_plugin_can_fua (void *handle) return NBDKIT_FUA_NATIVE; } +static int +test_layers_plugin_can_multi_conn (void *handle) +{ + DEBUG_FUNCTION; + return 1; +} + static int test_layers_plugin_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -196,6 +203,7 @@ static struct nbdkit_plugin plugin = { .can_trim = test_layers_plugin_can_trim, .can_zero = test_layers_plugin_can_zero, .can_fua = test_layers_plugin_can_fua, + .can_multi_conn = test_layers_plugin_can_multi_conn, .pread = test_layers_plugin_pread, .pwrite = test_layers_plugin_pwrite, .flush = test_layers_plugin_flush, diff --git a/tests/test-layers.c b/tests/test-layers.c index 261b9c6..c8d15d2 100644 --- a/tests/test-layers.c +++ b/tests/test-layers.c @@ -349,6 +349,12 @@ main (int argc, char *argv[]) "filter1: test_layers_filter_is_rotational", "test_layers_plugin_is_rotational", NULL); + log_verify_seen_in_order + ("filter3: test_layers_filter_can_multi_conn", + "filter2: test_layers_filter_can_multi_conn", + "filter1: test_layers_filter_can_multi_conn", + "test_layers_plugin_can_multi_conn", + NULL); fprintf (stderr, "%s: protocol connected\n", program_name); -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 02/11] blocksize: Implement filtering of .can_multi_conn (forcing it to false).
I examined each filter to see which ones implement a cache and do not properly consider consistency across clients for flush requests. For these filters we should force .can_multi_conn to return false. I believe only one filter (blocksize) needs to be updated and all the other ones are safe. --- filters/blocksize/blocksize.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c index 34f58c9..9e0e713 100644 --- a/filters/blocksize/blocksize.c +++ b/filters/blocksize/blocksize.c @@ -164,6 +164,17 @@ blocksize_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, return size == -1 ? size : size & ~(minblock - 1); } +static int +blocksize_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* Although we are serializing all requests anyway so this likely + * doesn't make a difference, return false because the bounce buffer + * is not consistent for flush. + */ + return 0; +} + static int blocksize_pread (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *b, uint32_t count, uint64_t offs, @@ -370,6 +381,7 @@ static struct nbdkit_filter filter = { .config_help = blocksize_config_help, .prepare = blocksize_prepare, .get_size = blocksize_get_size, + .can_multi_conn = blocksize_can_multi_conn, .pread = blocksize_pread, .pwrite = blocksize_pwrite, .trim = blocksize_trim, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 03/11] nbd: Pass NBD_FLAG_CAN_MULTI_CONN through from clients.
--- plugins/nbd/nbd.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index 0443788..674f4a4 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -644,6 +644,14 @@ nbd_can_fua (void *handle) return h->flags & NBD_FLAG_SEND_FUA ? NBDKIT_FUA_NATIVE : NBDKIT_FUA_NONE; } +static int +nbd_can_multi_conn (void *handle) +{ + struct handle *h = handle; + + return h->flags & NBD_FLAG_CAN_MULTI_CONN; +} + /* Read data from the file. */ static int nbd_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -732,6 +740,7 @@ static struct nbdkit_plugin plugin = { .can_trim = nbd_can_trim, .can_zero = nbd_can_zero, .can_fua = nbd_can_fua, + .can_multi_conn = nbd_can_multi_conn, .pread = nbd_pread, .pwrite = nbd_pwrite, .zero = nbd_zero, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 04/11] ocaml: Implement .can_multi_conn callback for OCaml plugins.
--- plugins/ocaml/ocaml.c | 25 +++++++++++++++++++++++++ plugins/ocaml/NBDKit.ml | 10 +++++++++- plugins/ocaml/NBDKit.mli | 2 ++ 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/plugins/ocaml/ocaml.c b/plugins/ocaml/ocaml.c index 7dd5e4e..a89ad08 100644 --- a/plugins/ocaml/ocaml.c +++ b/plugins/ocaml/ocaml.c @@ -111,6 +111,8 @@ static value flush_fn; static value trim_fn; static value zero_fn; +static value can_multi_conn_fn; + /* Wrapper functions that translate calls from C (ie. nbdkit) to OCaml. */ static void @@ -167,6 +169,8 @@ unload_wrapper (void) REMOVE (flush); REMOVE (trim); REMOVE (zero); + + REMOVE (can_multi_conn); } static int @@ -550,6 +554,25 @@ zero_wrapper (void *h, uint32_t count, uint64_t offset, uint32_t flags) CAMLreturnT (int, 0); } +static int +can_multi_conn_wrapper (void *h) +{ + CAMLparam0 (); + CAMLlocal1 (rv); + + caml_leave_blocking_section (); + + rv = caml_callback_exn (can_multi_conn_fn, *(value *) h); + if (Is_exception_result (rv)) { + nbdkit_error ("%s", caml_format_exception (Extract_exception (rv))); + caml_enter_blocking_section (); + CAMLreturnT (int, -1); + } + + caml_enter_blocking_section (); + CAMLreturnT (int, Bool_val (rv)); +} + value ocaml_nbdkit_set_thread_model (value modelv) { @@ -629,6 +652,8 @@ SET(flush) SET(trim) SET(zero) +SET(can_multi_conn) + /* NB: noalloc function. */ value ocaml_nbdkit_set_error (value nv) diff --git a/plugins/ocaml/NBDKit.ml b/plugins/ocaml/NBDKit.ml index 1937bc2..fdeecd3 100644 --- a/plugins/ocaml/NBDKit.ml +++ b/plugins/ocaml/NBDKit.ml @@ -71,6 +71,8 @@ type 'a plugin = { flush : ('a -> flags -> unit) option; trim : ('a -> int32 -> int64 -> flags -> unit) option; zero : ('a -> int32 -> int64 -> flags -> unit) option; + + can_multi_conn : ('a -> bool) option; } let default_callbacks = { @@ -106,6 +108,8 @@ let default_callbacks = { flush = None; trim = None; zero = None; + + can_multi_conn = None; } type thread_model @@ -149,6 +153,8 @@ external set_flush : ('a -> flags -> unit) -> unit = "ocaml_nbdkit_set_flush" external set_trim : ('a -> int32 -> int64 -> flags -> unit) -> unit = "ocaml_nbdkit_set_trim" external set_zero : ('a -> int32 -> int64 -> flags -> unit) -> unit = "ocaml_nbdkit_set_zero" +external set_can_multi_conn : ('a -> bool) -> unit = "ocaml_nbdkit_set_can_multi_conn" + let may f = function None -> () | Some a -> f a let register_plugin thread_model plugin @@ -203,7 +209,9 @@ let register_plugin thread_model plugin may set_pwrite plugin.pwrite; may set_flush plugin.flush; may set_trim plugin.trim; - may set_zero plugin.zero + may set_zero plugin.zero; + + may set_can_multi_conn plugin.can_multi_conn external _set_error : int -> unit = "ocaml_nbdkit_set_error" "noalloc" diff --git a/plugins/ocaml/NBDKit.mli b/plugins/ocaml/NBDKit.mli index c0f9248..ec888ce 100644 --- a/plugins/ocaml/NBDKit.mli +++ b/plugins/ocaml/NBDKit.mli @@ -73,6 +73,8 @@ type 'a plugin = { flush : ('a -> flags -> unit) option; trim : ('a -> int32 -> int64 -> flags -> unit) option; zero : ('a -> int32 -> int64 -> flags -> unit) option; + + can_multi_conn : ('a -> bool) option; } (** The plugin fields and callbacks. ['a] is the handle type. *) -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 05/11] sh: Implement can_multi_conn callback for shell script plugins.
--- plugins/sh/sh.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/plugins/sh/sh.c b/plugins/sh/sh.c index 67750e5..49ade78 100644 --- a/plugins/sh/sh.c +++ b/plugins/sh/sh.c @@ -579,6 +579,12 @@ sh_can_fua (void *handle) } } +static int +sh_can_multi_conn (void *handle) +{ + return boolean_method (handle, "can_multi_conn"); +} + static int sh_flush (void *handle, uint32_t flags) { @@ -699,6 +705,7 @@ static struct nbdkit_plugin plugin = { .can_trim = sh_can_trim, .can_zero = sh_can_zero, .can_fua = sh_can_fua, + .can_multi_conn = sh_can_multi_conn, .pread = sh_pread, .pwrite = sh_pwrite, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 06/11] plugins: Return NBD_FLAG_CAN_MULTI_CONN from some readonly plugins.
It is safe to return NBD_FLAG_CAN_MULTI_CONN from almost all readonly plugins. However only do it where the plugin thread model is NBDKIT_THREAD_MODEL_PARALLEL and the plugin could plausibly be used in a high performance situation. --- plugins/floppy/floppy.c | 8 ++++++++ plugins/iso/iso.c | 8 ++++++++ plugins/pattern/pattern.c | 8 ++++++++ plugins/random/random.c | 8 ++++++++ 4 files changed, 32 insertions(+) diff --git a/plugins/floppy/floppy.c b/plugins/floppy/floppy.c index 5e23c88..0efb8dd 100644 --- a/plugins/floppy/floppy.c +++ b/plugins/floppy/floppy.c @@ -122,6 +122,13 @@ floppy_get_size (void *handle) return virtual_size (&floppy.regions); } +/* Serves the same data over multiple connections. */ +static int +floppy_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data from the file. */ static int floppy_pread (void *handle, void *buf, uint32_t count, uint64_t offset) @@ -192,6 +199,7 @@ static struct nbdkit_plugin plugin = { .magic_config_key = "dir", .open = floppy_open, .get_size = floppy_get_size, + .can_multi_conn = floppy_can_multi_conn, .pread = floppy_pread, .errno_is_preserved = 1, }; diff --git a/plugins/iso/iso.c b/plugins/iso/iso.c index 7ed5e7a..7431b48 100644 --- a/plugins/iso/iso.c +++ b/plugins/iso/iso.c @@ -258,6 +258,13 @@ iso_get_size (void *handle) return statbuf.st_size; } +/* Serves the same data over multiple connections. */ +static int +iso_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data from the file. */ static int iso_pread (void *handle, void *buf, uint32_t count, uint64_t offset) @@ -291,6 +298,7 @@ static struct nbdkit_plugin plugin = { .magic_config_key = "dir", .open = iso_open, .get_size = iso_get_size, + .can_multi_conn = iso_can_multi_conn, .pread = iso_pread, .errno_is_preserved = 1, }; diff --git a/plugins/pattern/pattern.c b/plugins/pattern/pattern.c index 32db381..23e9019 100644 --- a/plugins/pattern/pattern.c +++ b/plugins/pattern/pattern.c @@ -88,6 +88,13 @@ pattern_get_size (void *handle) return size; } +/* Serves the same data over multiple connections. */ +static int +pattern_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data. */ static int pattern_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -118,6 +125,7 @@ static struct nbdkit_plugin plugin = { .config_help = pattern_config_help, .open = pattern_open, .get_size = pattern_get_size, + .can_multi_conn = pattern_can_multi_conn, .pread = pattern_pread, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. diff --git a/plugins/random/random.c b/plugins/random/random.c index 9c805ab..ba51520 100644 --- a/plugins/random/random.c +++ b/plugins/random/random.c @@ -107,6 +107,13 @@ random_get_size (void *handle) return size; } +/* Serves the same data over multiple connections. */ +static int +random_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data. */ static int random_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -148,6 +155,7 @@ static struct nbdkit_plugin plugin = { .config_help = random_config_help, .open = random_open, .get_size = random_get_size, + .can_multi_conn = random_can_multi_conn, .pread = random_pread, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 07/11] file: Implement NBDKIT_API_VERSION 2.
Upgrade this plugin to version 2 of the API. The main change is implementing FUA support. --- plugins/file/file.c | 62 ++++++++++++++++++++++++++++++--------------- 1 file changed, 42 insertions(+), 20 deletions(-) diff --git a/plugins/file/file.c b/plugins/file/file.c index 2a8c9b5..dcff0ee 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -52,6 +52,8 @@ #include <linux/fs.h> /* For BLKZEROOUT */ #endif +#define NBDKIT_API_VERSION 2 + #include <nbdkit-plugin.h> #include "isaligned.h" @@ -263,9 +265,30 @@ file_can_trim (void *handle) #endif } +static int +file_can_fua (void *handle) +{ + return NBDKIT_FUA_NATIVE; +} + +/* Flush the file to disk. */ +static int +file_flush (void *handle, uint32_t flags) +{ + struct handle *h = handle; + + if (fdatasync (h->fd) == -1) { + nbdkit_error ("fdatasync: %m"); + return -1; + } + + return 0; +} + /* Read data from the file. */ static int -file_pread (void *handle, void *buf, uint32_t count, uint64_t offset) +file_pread (void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags) { struct handle *h = handle; @@ -289,7 +312,8 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset) /* Write data to the file. */ static int -file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) +file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset, + uint32_t flags) { struct handle *h = handle; @@ -304,6 +328,9 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) offset += r; } + if ((flags & NBDKIT_FLAG_FUA) && file_flush (handle, 0) == -1) + return -1; + return 0; } @@ -323,20 +350,20 @@ do_fallocate(int fd, int mode, off_t offset, off_t len) /* Write zeroes to the file. */ static int -file_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) +file_zero (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { struct handle *h = handle; int r; #ifdef FALLOC_FL_PUNCH_HOLE - if (h->can_punch_hole && may_trim) { + if (h->can_punch_hole && (flags & NBDKIT_FLAG_MAY_TRIM)) { r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, offset, count); if (r == 0) { if (file_debug_zero) nbdkit_debug ("h->can_punch_hole && may_trim: " "zero succeeded using fallocate"); - return 0; + goto out; } if (errno != EOPNOTSUPP) { @@ -355,7 +382,7 @@ file_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) if (file_debug_zero) nbdkit_debug ("h->can_zero-range: " "zero succeeded using fallocate"); - return 0; + goto out; } if (errno != EOPNOTSUPP) { @@ -380,7 +407,7 @@ file_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) if (file_debug_zero) nbdkit_debug ("h->can_punch_hole && h->can_fallocate: " "zero succeeded using fallocate"); - return 0; + goto out; } if (errno != EOPNOTSUPP) { @@ -410,7 +437,7 @@ file_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) if (file_debug_zero) nbdkit_debug ("h->can_zeroout && IS_ALIGNED: " "zero succeeded using BLKZEROOUT"); - return 0; + goto out; } if (errno != ENOTTY) { @@ -427,25 +454,16 @@ file_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) nbdkit_debug ("zero falling back to writing"); errno = EOPNOTSUPP; return -1; -} -/* Flush the file to disk. */ -static int -file_flush (void *handle) -{ - struct handle *h = handle; - - if (fdatasync (h->fd) == -1) { - nbdkit_error ("fdatasync: %m"); + out: + if ((flags & NBDKIT_FLAG_FUA) && file_flush (handle, 0) == -1) return -1; - } - return 0; } /* Punch a hole in the file. */ static int -file_trim (void *handle, uint32_t count, uint64_t offset) +file_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags) { #ifdef FALLOC_FL_PUNCH_HOLE struct handle *h = handle; @@ -470,6 +488,9 @@ file_trim (void *handle, uint32_t count, uint64_t offset) } #endif + if ((flags & NBDKIT_FLAG_FUA) && file_flush (handle, 0) == -1) + return -1; + return 0; } @@ -487,6 +508,7 @@ static struct nbdkit_plugin plugin = { .close = file_close, .get_size = file_get_size, .can_trim = file_can_trim, + .can_fua = file_can_fua, .pread = file_pread, .pwrite = file_pwrite, .flush = file_flush, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 08/11] file: Return NBD_FLAG_CAN_MULTI_CONN for the file plugin.
This allows multiple connections from a single client, and should be safe assuming flush/FUA has been implemented correctly in the previous commit. --- plugins/file/file.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/plugins/file/file.c b/plugins/file/file.c index dcff0ee..628f8fb 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -250,6 +250,13 @@ file_get_size (void *handle) } } +/* Allow multiple parallel connections from a single client. */ +static int +file_can_multi_conn (void *handle) +{ + return 1; +} + static int file_can_trim (void *handle) { @@ -507,6 +514,7 @@ static struct nbdkit_plugin plugin = { .open = file_open, .close = file_close, .get_size = file_get_size, + .can_multi_conn = file_can_multi_conn, .can_trim = file_can_trim, .can_fua = file_can_fua, .pread = file_pread, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 09/11] partitioning: Return NBD_FLAG_CAN_MULTI_CONN.
This plugin has a parallel thread model and handles flush correctly across connections, so return NBD_FLAG_CAN_MULTI_CONN. --- plugins/partitioning/partitioning.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/plugins/partitioning/partitioning.c b/plugins/partitioning/partitioning.c index 43bdd7c..205c059 100644 --- a/plugins/partitioning/partitioning.c +++ b/plugins/partitioning/partitioning.c @@ -288,6 +288,13 @@ partitioning_get_size (void *handle) return virtual_size (®ions); } +/* Serves the same data over multiple connections. */ +static int +partitioning_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data. */ static int partitioning_pread (void *handle, void *buf, uint32_t count, uint64_t offset) @@ -416,6 +423,7 @@ static struct nbdkit_plugin plugin = { .magic_config_key = "file", .open = partitioning_open, .get_size = partitioning_get_size, + .can_multi_conn = partitioning_can_multi_conn, .pread = partitioning_pread, .pwrite = partitioning_pwrite, .flush = partitioning_flush, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 10/11] data, memory: Return NBD_FLAG_CAN_MULTI_CONN.
These plugins use NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS because of their use of the global sparse array. However since they are both RAM disks, and thus (a) flush is a no-op, and (b) the locked part of each request is short, it is both safe and there may be some performance benefit to using NBD_FLAG_CAN_MULTI_CONN. --- plugins/data/data.c | 8 ++++++++ plugins/memory/memory.c | 8 ++++++++ 2 files changed, 16 insertions(+) diff --git a/plugins/data/data.c b/plugins/data/data.c index f8e91be..11a7b69 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -324,6 +324,13 @@ data_get_size (void *handle) return size; } +/* Serves the same data over multiple connections. */ +static int +data_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data. */ static int data_pread (void *handle, void *buf, uint32_t count, uint64_t offset) @@ -366,6 +373,7 @@ static struct nbdkit_plugin plugin = { .dump_plugin = data_dump_plugin, .open = data_open, .get_size = data_get_size, + .can_multi_conn = data_can_multi_conn, .pread = data_pread, .pwrite = data_pwrite, .zero = data_zero, diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index 4013087..0084401 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -115,6 +115,13 @@ memory_get_size (void *handle) return (int64_t) size; } +/* Serves the same data over multiple connections. */ +static int +memory_can_multi_conn (void *handle) +{ + return 1; +} + /* Read data. */ static int memory_pread (void *handle, void *buf, uint32_t count, uint64_t offset) @@ -156,6 +163,7 @@ static struct nbdkit_plugin plugin = { .config_help = memory_config_help, .open = memory_open, .get_size = memory_get_size, + .can_multi_conn = memory_can_multi_conn, .pread = memory_pread, .pwrite = memory_pwrite, .zero = memory_zero, -- 2.19.2
Richard W.M. Jones
2019-Jan-05 14:50 UTC
[Libguestfs] [PATCH nbdkit v2 11/11] memory, data: Use fine-grained locking and change thread model to parallel.
Instead of locking around every request, use explicit and (slightly) more fine-grained, and change the thread model to parallel. The same change is made to the memory plugin and the data plugin. This improves performance slightly. Using fio with 8 threads and multi-conn enabled with -C 8: Before: read: IOPS=103k, BW=401MiB/s (420MB/s)(46.0GiB/120002msec) write: IOPS=103k, BW=401MiB/s (420MB/s)(46.0GiB/120002msec) After: read: IOPS=112k, BW=437MiB/s (458MB/s)(51.2GiB/120001msec) write: IOPS=112k, BW=437MiB/s (458MB/s)(51.2GiB/120001msec) For comparison: The memory plugin implemented using a simple malloc instead of a sparse array: read: IOPS=133k, BW=518MiB/s (544MB/s)(60.7GiB/120002msec) write: IOPS=133k, BW=518MiB/s (543MB/s)(60.7GiB/120002msec) Directly running fio against /dev/shm: read: IOPS=1018k, BW=3978MiB/s (4171MB/s)(466GiB/120001msec) write: IOPS=1018k, BW=3979MiB/s (4172MB/s)(466GiB/120001msec) --- plugins/data/data.c | 22 +++++++++++++++++++--- plugins/memory/memory.c | 21 +++++++++++++++++++-- 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/plugins/data/data.c b/plugins/data/data.c index 11a7b69..1d4c5b9 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -39,6 +39,8 @@ #include <inttypes.h> #include <string.h> +#include <pthread.h> + #if defined(HAVE_GNUTLS) && defined(HAVE_GNUTLS_BASE64_DECODE2) #include <gnutls/gnutls.h> #endif @@ -56,8 +58,11 @@ static int64_t size = -1; /* Size of data specified on the command line. */ static int64_t data_size = -1; -/* Sparse array. */ +/* Sparse array - the lock must be held when accessing this from + * connected callbacks. + */ static struct sparse_array *sa; +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; /* Debug directory operations (-D data.dir=1). */ int data_debug_dir; @@ -305,7 +310,7 @@ data_dump_plugin (void) #endif } -#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL /* No meaning, just used as the address for the handle. */ static int dh; @@ -335,7 +340,9 @@ data_can_multi_conn (void *handle) static int data_pread (void *handle, void *buf, uint32_t count, uint64_t offset) { + pthread_mutex_lock (&lock); sparse_array_read (sa, buf, count, offset); + pthread_mutex_unlock (&lock); return 0; } @@ -343,14 +350,21 @@ data_pread (void *handle, void *buf, uint32_t count, uint64_t offset) static int data_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) { - return sparse_array_write (sa, buf, count, offset); + int r; + + pthread_mutex_lock (&lock); + r = sparse_array_write (sa, buf, count, offset); + pthread_mutex_unlock (&lock); + return r; } /* Zero. */ static int data_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) { + pthread_mutex_lock (&lock); sparse_array_zero (sa, count, offset); + pthread_mutex_unlock (&lock); return 0; } @@ -358,7 +372,9 @@ data_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) static int data_trim (void *handle, uint32_t count, uint64_t offset) { + pthread_mutex_lock (&lock); sparse_array_zero (sa, count, offset); + pthread_mutex_unlock (&lock); return 0; } diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index 0084401..e27e127 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -42,6 +42,8 @@ #include <errno.h> #include <assert.h> +#include <pthread.h> + #include <nbdkit-plugin.h> #include "sparse.h" @@ -52,7 +54,11 @@ static int64_t size = 0; /* Debug directory operations (-D memory.dir=1). */ int memory_debug_dir; +/* Sparse array - the lock must be held when accessing this from + * connected callbacks. + */ static struct sparse_array *sa; +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; static void memory_load (void) @@ -106,7 +112,7 @@ memory_open (int readonly) return NBDKIT_HANDLE_NOT_NEEDED; } -#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_ALL_REQUESTS +#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL /* Get the disk size. */ static int64_t @@ -126,7 +132,9 @@ memory_can_multi_conn (void *handle) static int memory_pread (void *handle, void *buf, uint32_t count, uint64_t offset) { + pthread_mutex_lock (&lock); sparse_array_read (sa, buf, count, offset); + pthread_mutex_unlock (&lock); return 0; } @@ -134,14 +142,21 @@ memory_pread (void *handle, void *buf, uint32_t count, uint64_t offset) static int memory_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset) { - return sparse_array_write (sa, buf, count, offset); + int r; + + pthread_mutex_lock (&lock); + r = sparse_array_write (sa, buf, count, offset); + pthread_mutex_unlock (&lock); + return r; } /* Zero. */ static int memory_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) { + pthread_mutex_lock (&lock); sparse_array_zero (sa, count, offset); + pthread_mutex_unlock (&lock); return 0; } @@ -149,7 +164,9 @@ memory_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) static int memory_trim (void *handle, uint32_t count, uint64_t offset) { + pthread_mutex_lock (&lock); sparse_array_zero (sa, count, offset); + pthread_mutex_unlock (&lock); return 0; } -- 2.19.2
Eric Blake
2019-Jan-05 22:48 UTC
Re: [Libguestfs] [PATCH nbdkit v2 02/11] blocksize: Implement filtering of .can_multi_conn (forcing it to false).
On 1/5/19 8:50 AM, Richard W.M. Jones wrote:> I examined each filter to see which ones implement a cache and do not > properly consider consistency across clients for flush requests. For > these filters we should force .can_multi_conn to return false. > > I believe only one filter (blocksize) needs to be updated and all the > other ones are safe. > --- > filters/blocksize/blocksize.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c > index 34f58c9..9e0e713 100644 > --- a/filters/blocksize/blocksize.c > +++ b/filters/blocksize/blocksize.c > @@ -164,6 +164,17 @@ blocksize_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, > return size == -1 ? size : size & ~(minblock - 1); > } > > +static int > +blocksize_can_multi_conn (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle) > +{ > + /* Although we are serializing all requests anyway so this likely > + * doesn't make a difference, return false because the bounce buffer > + * is not consistent for flush. > + */ > + return 0; > +}I think we could get away with returning 1, _because_ we are serializing all requests. Even though we make multiple calls into the plugin (worst-case being read-modify-write-flush on a client write-with-FUA), no other operation can compete with a client flush. But I also agree that if we were to fix the bounce buffer to be thread-safe and change the filter to be more parallel, then you are also right that unless we add a .flush callback and add locking so that all calls into the plugin are guarded within the same lock, then returning 0 is appropriate, even if the plugin itself returns 1. So I agree with adding this callback, and could go either way with you returning 0 or 1 for now. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jan-05 22:51 UTC
Re: [Libguestfs] [PATCH nbdkit v2 06/11] plugins: Return NBD_FLAG_CAN_MULTI_CONN from some readonly plugins.
On 1/5/19 8:50 AM, Richard W.M. Jones wrote:> It is safe to return NBD_FLAG_CAN_MULTI_CONN from almost all readonly > plugins. However only do it where the plugin thread model is > NBDKIT_THREAD_MODEL_PARALLEL and the plugin could plausibly be used in > a high performance situation.Makes sense. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jan-05 22:55 UTC
Re: [Libguestfs] [PATCH nbdkit v2 07/11] file: Implement NBDKIT_API_VERSION 2.
On 1/5/19 8:50 AM, Richard W.M. Jones wrote:> Upgrade this plugin to version 2 of the API. The main change is > implementing FUA support.I'm a bit surprised we haven't done that yet :)> --- > plugins/file/file.c | 62 ++++++++++++++++++++++++++++++--------------- > 1 file changed, 42 insertions(+), 20 deletions(-) >Looks good. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2019-Jan-05 23:01 UTC
Re: [Libguestfs] [PATCH nbdkit v2 08/11] file: Return NBD_FLAG_CAN_MULTI_CONN for the file plugin.
On 1/5/19 8:50 AM, Richard W.M. Jones wrote:> This allows multiple connections from a single client, and should be > safe assuming flush/FUA has been implemented correctly in the previous > commit.Looks safe to me between the two patches. The NBD spec also states: * A client which uses multiple connections to a server to parallelize commands MUST NOT issue an `NBD_CMD_FLUSH` request until it has received the reply for all write commands which it expects to be covered by the flush. which implies that even if we encounter this race: client A: client B: issue write issue flush flush completes write completes we don't have to worry about that case, because client B didn't wait for the response before client A requested the flush. Kernel fdatasync() semantics match what we want (all writes that any client has received a reply to will be included in the flush, even if the flush is on a different client). ACK -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- [PATCH nbdkit v2 07/11] file: Implement NBDKIT_API_VERSION 2.
- [nbdkit PATCH] file: Add trim support
- [PATCH nbdkit v2 1/3] file: Move file operators to a new common/fileops mini-library.
- [PATCH nbdkit v2 01/11] server: Implement NBD_FLAG_CAN_MULTI_CONN.
- [PATCH nbdkit 0/7] server: Implement NBD_FLAG_CAN_MULTI_CONN.