Eric Blake
2020-Feb-10 21:37 UTC
[Libguestfs] Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
I will be following up to this email with four separate threads each addressed to the appropriate single list, with proposed changes to: - the NBD protocol - qemu: both server and client - libnbd: client - nbdkit: server The feature in question adds a new optional NBD_INFO_ packet to the NBD_OPT_GO portion of handshake, adding up to 16 bits of information that the server can advertise to the client at connection time about any known initial state of the export [review to this series may propose slight changes, such as using 32 bits; but hopefully by having all four series posted in tandem it becomes easier to see whether any such tweaks are warranted, and can keep such tweaks interoperable before any of the projects land the series upstream]. For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the image has at least one hole) and NBD_INIT_ZERO (the image reads completely as zero); the two bits are orthogonal and can be set independently, although it is easy enough to see completely sparse files with both bits set. Also, advertising the bits is orthogonal to whether the base:allocation metacontext is used, although a server with all possible extensions is likely to have the two concepts match one another. The new bits are added as an information chunk rather than as runtime flags; this is because the intended client of this information is operations like copying a sparse image into an NBD server destination. Such a client only cares at initialization if it needs to perform a pre-zeroing pass or if it can rely on the destination already reading as zero. Once the client starts making modifications, burdening the server with the ability to do a live runtime probe of current reads-as-zero state does not help the client, and burning per-export flags for something that quickly goes stale on the first edit was not thought to be wise, similarly, adding a new NBD_CMD did not seem worthwhile. The existing 'qemu-img convert source... nbd://...' is the first command line example that can benefit from the new information; the goal of adding a protocol extension was to make this benefit automatic without the user having to specify the proposed --target-is-zero when possible. I have a similar thread pending for qemu which adds similar known-reads-zero information to qcow2 files: https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg08075.html That qemu series is at v1, and based on review it has had so far, it will need some interface changes for v2, which means my qemu series here will need a slight rebasing, but I'm posting this series to all lists now to at least demonstrate what is possible when we have better startup information. Note that with this new bit, it is possible to learn if a destination is sparse as part of NBD_OPT_GO rather than having to use block-status commands. With existing block-status commands, you can use an O(n) scan of block-status to learn if an image reads as all zeroes (or short-circuit in O(1) time if the first offset is reported as probable data rather than reading as zero); but with this new bit, the answer is O(1). So even with Vladimir's recent change to make the spec permit 4G block-status even when max block size is 32M, or the proposed work to add 64-bit block-status, you still end up with more on-the-wire traffic for block-status to learn if an image is all zeroes than if the server just advertises this bit. But by keeping both extensions orthogonal, a server can implement whichever one or both reporting methods it finds easiest, and a client can work with whatever a server supplies with sane fallbacks when the server lacks either extension. Conversely, block-status tracks live changes to the image, while this bit is only valid at connection time. My repo for each of the four projects contains a tag 'nbd-init-v1': https://repo.or.cz/nbd/ericb.git/shortlog/refs/tags/nbd-init-v1 https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/nbd-init-v1 https://repo.or.cz/libnbd/ericb.git/shortlog/refs/tags/nbd-init-v1 https://repo.or.cz/nbdkit/ericb.git/shortlog/refs/tags/nbd-init-v1 For doing interoperability testing, I find it handy to use: PATH=/path/to/built/qemu:/path/to/built/nbdkit:$PATH /path/to/libnbd/run your command here to pick up just-built qemu-nbd, nbdsh, and nbdkit that all support the feature. For quickly setting flags: nbdkit eval init_sparse='exit 0' init_zero='exit 0' ... For quickly checking flags: qemu-nbd --list ... | grep init nbdsh -u uri... -c 'print(h.get_init_flags())' -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [libnbd PATCH 1/1] generator: Add support for NBD_INFO_INIT_STATE extension
The NBD protocol is adding an extension for servers to report the initial state of the image (for now, whether it is sparse and/or reads as all zeroes). Time to expose this to clients, via the new API nbd_get_init_flags(). The patch requires refreshing the nbd-protocol.h file from a corresponding contemporary nbdkit commit. Testing is possible with recent enough qemu. --- .gitignore | 1 + generator/generator | 40 +++++++++++++-- generator/states-newstyle-opt-go.c | 12 ++++- interop/Makefile.am | 9 +++- interop/init-zero.c | 78 ++++++++++++++++++++++++++++++ interop/init-zero.sh | 56 +++++++++++++++++++++ lib/flags.c | 14 +++++- lib/internal.h | 10 ++-- lib/nbd-protocol.h | 13 ++++- 9 files changed, 220 insertions(+), 13 deletions(-) create mode 100644 interop/init-zero.c create mode 100755 interop/init-zero.sh diff --git a/.gitignore b/.gitignore index 7536021..e356392 100644 --- a/.gitignore +++ b/.gitignore @@ -70,6 +70,7 @@ Makefile.in /html/*.?.html /include/libnbd.h /interop/dirty-bitmap +/interop/init-zero /interop/interop-nbdkit /interop/interop-nbdkit-tls-certs /interop/interop-nbdkit-tls-certs-allow-enabled diff --git a/generator/generator b/generator/generator index dea9bf7..0addc9d 100755 --- a/generator/generator +++ b/generator/generator @@ -979,7 +979,7 @@ let tls_enum = { } let all_enums = [ tls_enum ] -(* Flags. *) +(* Flags. See also Constants below. *) let cmd_flags = { flag_prefix = "CMD_FLAG"; flags = [ @@ -1969,7 +1969,8 @@ are free to pass in other contexts." ^ non_blocking_test_call_description; see_also = [SectionLink "Flag calls"; Link "add_meta_context"; - Link "block_status"; Link "aio_block_status"]; + Link "block_status"; Link "aio_block_status"; + Link "get_init_flags"]; }; "get_protocol", { @@ -1990,6 +1991,30 @@ Most modern NBD servers use C<\"newstyle-fixed\">. Link "get_tls_negotiated"]; }; + "get_init_flags", { + default_call with + args = []; ret = RInt; + permitted_states = [ Connected; Closed ]; + shortdesc = "return any init state flags advertised by the server"; + longdesc = "\ +Return the bitwise-OR of any initial state flags advertised by the +server. These flags may include C<LIBNBD_INIT_SPARSE> if the export +is not fully allocated, or C<LIBNBD_INIT_ZERO> if the export is known +to start with all contents reading as zeroes. Other flags might be +added in the future. + +Note that the flags advertised by the server refer only to the point +in time when the connection was established; libnbd does not track if +the results may have been rendered stale by intervening actions such +as write requests. Also, since both initial state bits and block +status are orthogonal extensions within the NBD protocol, there is no +requirement that a server's advertisement match what could be learned +by checking block status. +" +^ non_blocking_test_call_description; + see_also = [Link "get_handshake_flags"; Link "block_status"]; + }; + "get_size", { default_call with args = []; ret = RInt64; @@ -2292,7 +2317,7 @@ return only one extent per metadata context where that extent does not exceed C<count> bytes; however, libnbd does not validate that the server obeyed the flag."; see_also = [Link "add_meta_context"; Link "can_meta_context"; - Link "aio_block_status"]; + Link "aio_block_status"; Link "get_init_flags"]; }; "poll", { @@ -2619,7 +2644,8 @@ as described in L<libnbd(3)/Completion callbacks>. Other parameters behave as documented in L<nbd_block_status(3)>."; see_also = [SectionLink "Issuing asynchronous commands"; - Link "can_meta_context"; Link "block_status"]; + Link "can_meta_context"; Link "block_status"; + Link "get_init_flags"]; }; "aio_get_fd", { @@ -3031,6 +3057,7 @@ let first_version = [ "set_uri_allow_transports", (1, 2); "set_uri_allow_tls", (1, 2); "set_uri_allow_local_file", (1, 2); + "get_init_flags", (1, 2); (* These calls are proposed for a future version of libnbd, but * have not been added to any released version so far. @@ -3039,7 +3066,7 @@ let first_version = [ *) ] -(* Constants, etc. *) +(* Constants, etc. See also Enums and Flags above. *) let constants = [ "AIO_DIRECTION_READ", 1; "AIO_DIRECTION_WRITE", 2; @@ -3048,6 +3075,9 @@ let constants = [ "READ_DATA", 1; "READ_HOLE", 2; "READ_ERROR", 3; + + "INIT_SPARSE", 1 lsl 0; + "INIT_ZERO", 1 lsl 1; ] let metadata_namespaces = [ diff --git a/generator/states-newstyle-opt-go.c b/generator/states-newstyle-opt-go.c index 57ad55a..c2ef01e 100644 --- a/generator/states-newstyle-opt-go.c +++ b/generator/states-newstyle-opt-go.c @@ -1,5 +1,5 @@ /* nbd client library in userspace: state machine - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -59,6 +59,8 @@ STATE_MACHINE { switch (send_from_wbuf (h)) { case -1: SET_NEXT_STATE (%.DEAD); return 0; case 0: + /* For now, we don't allow the client to configure which infos to + * request, but instead rely on what the server volunteers. */ h->sbuf.nrinfos = 0; h->wbuf = &h->sbuf; h->wlen = 2; @@ -130,6 +132,14 @@ STATE_MACHINE { return 0; } break; + case NBD_INFO_INIT_STATE: + if (len != sizeof h->sbuf.or.payload.init) { + SET_NEXT_STATE (%.DEAD); + set_error (0, "handshake: incorrect NBD_INFO_INIT_STATE option reply length"); + return 0; + } + h->initflags = be16toh (h->sbuf.or.payload.init.flags); + break; default: /* XXX Handle other info types, like NBD_INFO_BLOCK_SIZE */ debug (h, "skipping unknown NBD_REP_INFO type %d", diff --git a/interop/Makefile.am b/interop/Makefile.am index 345be7c..ed5b553 100644 --- a/interop/Makefile.am +++ b/interop/Makefile.am @@ -1,5 +1,5 @@ # nbd client library in userspace -# Copyright (C) 2013-2019 Red Hat Inc. +# Copyright (C) 2013-2020 Red Hat Inc. # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public @@ -50,12 +50,14 @@ check_PROGRAMS += \ dirty-bitmap \ socket-activation-qemu-nbd \ structured-read \ + init-zero \ $(NULL) TESTS += \ interop-qemu-nbd \ dirty-bitmap.sh \ socket-activation-qemu-nbd \ structured-read.sh \ + init-zero.sh \ $(NULL) # tls tests assume the pre-existence of files created in ../tests/Makefile.am, @@ -140,6 +142,11 @@ structured_read_CPPFLAGS = -I$(top_srcdir)/include structured_read_CFLAGS = $(WARNINGS_CFLAGS) structured_read_LDADD = $(top_builddir)/lib/libnbd.la +init_zero_SOURCES = init-zero.c +init_zero_CPPFLAGS = -I$(top_srcdir)/include +init_zero_CFLAGS = $(WARNINGS_CFLAGS) +init_zero_LDADD = $(top_builddir)/lib/libnbd.la + endif HAVE_QEMU_NBD if HAVE_NBDKIT diff --git a/interop/init-zero.c b/interop/init-zero.c new file mode 100644 index 0000000..0cf1d87 --- /dev/null +++ b/interop/init-zero.c @@ -0,0 +1,78 @@ +/* NBD client library in userspace + * Copyright (C) 2013-2020 Red Hat Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* Test init state queries. */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <assert.h> +#include <stdbool.h> +#include <errno.h> + +#include <libnbd.h> + +/* Depends on init-zero.sh setting things up so that we are given a + * socket to a server advertising NBD_INIT_ZERO. + */ + +int +main (int argc, char *argv[]) +{ + struct nbd_handle *nbd; + int r; + + if (argc < 2) { + fprintf (stderr, "%s socket\n", argv[0]); + exit (EXIT_FAILURE); + } + + nbd = nbd_create (); + if (nbd == NULL) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + if (nbd_connect_unix (nbd, argv[1]) == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + r = nbd_get_init_flags (nbd); + if (r == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + if (!(r & LIBNBD_INIT_ZERO)) { + fprintf (stderr, "Missing expected zero init state\n"); + exit (EXIT_FAILURE); + } + + if (nbd_shutdown (nbd, 0) == -1) { + fprintf (stderr, "%s\n", nbd_get_error ()); + exit (EXIT_FAILURE); + } + + nbd_close (nbd); + + exit (EXIT_SUCCESS); +} diff --git a/interop/init-zero.sh b/interop/init-zero.sh new file mode 100755 index 0000000..349f55e --- /dev/null +++ b/interop/init-zero.sh @@ -0,0 +1,56 @@ +#!/usr/bin/env bash +# nbd client library in userspace +# Copyright (C) 2019-2020 Red Hat Inc. +# +# This library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2 of the License, or (at your option) any later version. +# +# This library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with this library; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +# Test structured read callbacks. + +source ../tests/functions.sh +set -e +set -x + +requires qemu-img --version +requires qemu-nbd --version + +sock=`mktemp -u` +files="init-zero.qcow2 $sock" +rm -f $files +cleanup_fn rm -f $files + +# qemu 5.0 added ability to track if a qcow2 image is all zeroes +# Probe to see if qemu-nbd exposes that feature, before running test +qemu-img create -f qcow2 init-zero.qcow2 64k +qemu-nbd -f qcow2 -k $sock init-zero.qcow2& +cleanup_fn kill $! +tries=0 +while test ! -e $sock && test $tries -lt 5; do + sleep 1 + tries=$((tries + 1)) +done + +case $(qemu-nbd --list -k $sock) in + *'init state'*zero*) + # Run the test. + $VG ./init-zero $sock + status=$? + ;; + *) + echo "$0: skipping because qemu-nbd does not support all-zero bit" + status=77 + ;; +esac + +exit $status diff --git a/lib/flags.c b/lib/flags.c index d55d10a..62c3252 100644 --- a/lib/flags.c +++ b/lib/flags.c @@ -1,5 +1,5 @@ /* NBD client library in userspace - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -143,6 +143,18 @@ nbd_unlocked_can_meta_context (struct nbd_handle *h, const char *name) return 0; } +int +nbd_unlocked_get_init_flags (struct nbd_handle *h) +{ + if (h->eflags == 0) { + set_error (EINVAL, "server has not returned init flags, " + "you need to connect to the server first"); + return -1; + } + + return h->initflags; +} + int64_t nbd_unlocked_get_size (struct nbd_handle *h) { diff --git a/lib/internal.h b/lib/internal.h index 6eb50d8..aa555b5 100644 --- a/lib/internal.h +++ b/lib/internal.h @@ -1,5 +1,5 @@ /* nbd client library in userspace: internal definitions - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -97,12 +97,13 @@ struct nbd_handle { uint16_t gflags; /* Export size and per-export flags, received during handshake. NB: - * These are *both* *only* valid if eflags != 0. This is because - * all servers should set NBD_FLAG_HAS_FLAGS, so eflags should - * always be != 0, and we set both fields at the same time. + * These are *only* valid if eflags != 0. This is because all + * servers should set NBD_FLAG_HAS_FLAGS, so eflags should always be + * != 0, and we set the other fields at the same time. */ uint64_t exportsize; uint16_t eflags; + uint16_t initflags; /* Flags set by the state machine to tell what protocol and whether * TLS was negotiated. @@ -159,6 +160,7 @@ struct nbd_handle { struct nbd_fixed_new_option_reply option_reply; union { struct nbd_fixed_new_option_reply_info_export export; + struct nbd_fixed_new_option_reply_info_init init; struct { struct nbd_fixed_new_option_reply_meta_context context; char str[NBD_MAX_STRING]; diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h index df0b4c6..50531a0 100644 --- a/lib/nbd-protocol.h +++ b/lib/nbd-protocol.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -146,6 +146,7 @@ struct nbd_fixed_new_option_reply { #define NBD_INFO_NAME 1 #define NBD_INFO_DESCRIPTION 2 #define NBD_INFO_BLOCK_SIZE 3 +#define NBD_INFO_INIT_STATE 4 /* NBD_INFO_EXPORT reply (follows fixed_new_option_reply). */ struct nbd_fixed_new_option_reply_info_export { @@ -154,6 +155,16 @@ struct nbd_fixed_new_option_reply_info_export { uint16_t eflags; /* per-export flags */ } NBD_ATTRIBUTE_PACKED; +/* NBD_INFO_INIT_STATE reply. */ +struct nbd_fixed_new_option_reply_info_init { + uint16_t info; /* NBD_INFO_INIT_STATE */ + uint16_t flags; /* per-export init flags */ +} NBD_ATTRIBUTE_PACKED; + +/* Constants for use in reply to NBD_INFO_INIT_STATE. */ +#define NBD_INIT_SPARSE (1 << 0) +#define NBD_INIT_ZERO (1 << 1) + /* NBD_REP_META_CONTEXT reply (follows fixed_new_option_reply). */ struct nbd_fixed_new_option_reply_meta_context { uint32_t context_id; /* metadata context ID */ -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 00/10] NBD_INFO_INIT_STATE extension
See the cross-posted cover letter for more details. Eric Blake (10): protocol: Add NBD_INFO_INIT_STATE extension protocol: Wire up backend support for NBD_INFO_INIT_STATE filters: Wire up filter support for NBD_INFO_INIT_STATE plugins: Wire up in-memory plugin support for NBD_INFO_INIT_STATE plugins: Wire up file-based plugin support for NBD_INFO_INIT_STATE plugins: Wire up shell plugin support for NBD_INFO_INIT_STATE plugins: Wire up python plugin support for NBD_INFO_INIT_STATE plugins: Wire up rust plugin support for NBD_INFO_INIT_STATE plugins: Wire up ocaml plugin support for NBD_INFO_INIT_STATE plugins: Wire up nbd plugin support for NBD_INFO_INIT_STATE common/protocol/nbd-protocol.h | 13 ++- common/sparse/sparse.c | 30 ++++++- common/sparse/sparse.h | 19 ++++- docs/nbdkit-filter.pod | 10 ++- docs/nbdkit-plugin.pod | 23 ++++++ filters/cache/blk.c | 7 +- filters/extentlist/extentlist.c | 28 +++++++ filters/log/log.c | 9 +- filters/log/nbdkit-log-filter.pod | 2 +- filters/noextents/nbdkit-noextents-filter.pod | 11 +-- filters/noextents/noextents.c | 18 +++- filters/truncate/truncate.c | 15 +++- include/nbdkit-filter.h | 8 +- include/nbdkit-plugin.h | 5 +- plugins/data/data.c | 21 ++++- plugins/eval/eval.c | 4 + plugins/eval/nbdkit-eval-plugin.pod | 9 +- plugins/file/file.c | 82 ++++++++++++++++--- plugins/full/full.c | 18 +++- plugins/info/info.c | 15 +++- plugins/linuxdisk/linuxdisk.c | 13 ++- plugins/memory/memory.c | 19 +++++ plugins/nbd/nbd.c | 40 ++++++++- plugins/null/null.c | 18 +++- plugins/ocaml/NBDKit.ml | 14 +++- plugins/ocaml/NBDKit.mli | 5 +- plugins/ocaml/ocaml.c | 47 +++++++++++ plugins/partitioning/partitioning.c | 13 ++- plugins/python/nbdkit-python-plugin.pod | 14 ++++ plugins/python/python.c | 16 +++- plugins/rust/nbdkit-rust-plugin.pod | 2 +- plugins/rust/src/lib.rs | 9 ++ plugins/sh/example.sh | 14 ++++ plugins/sh/methods.c | 18 +++- plugins/sh/methods.h | 4 +- plugins/sh/nbdkit-sh-plugin.pod | 8 +- plugins/sh/sh.c | 4 +- plugins/split/split.c | 59 +++++++++++-- server/backend.c | 26 ++++++ server/filters.c | 44 +++++++++- server/internal.h | 12 ++- server/plugins.c | 24 +++++- server/protocol-handshake-newstyle.c | 56 ++++++++++++- tests/Makefile.am | 8 +- tests/test-eval-init.sh | 54 ++++++++++++ tests/test-file-init.sh | 69 ++++++++++++++++ tests/test-memory-init.sh | 65 +++++++++++++++ tests/test-python-plugin.py | 8 +- tests/test_python.py | 22 ++++- 49 files changed, 988 insertions(+), 64 deletions(-) create mode 100755 tests/test-eval-init.sh create mode 100755 tests/test-file-init.sh create mode 100755 tests/test-memory-init.sh -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 01/10] protocol: Add NBD_INFO_INIT_STATE extension
The NBD protocol is adding an extension to make it possible for servers to advertise to the client if the export is known to start life as a sparse file or with all-zero state. Although this extension is optional, clients can take advantage of it during certain operations (such as making image copying more efficient by not having to request a pre-zeroing pass). This patch exposes the constants that will be used by nbdkit, as well as shared in libnbd for client side use. Signed-off-by: Eric Blake <eblake@redhat.com> --- common/protocol/nbd-protocol.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/common/protocol/nbd-protocol.h b/common/protocol/nbd-protocol.h index df0b4c6..50531a0 100644 --- a/common/protocol/nbd-protocol.h +++ b/common/protocol/nbd-protocol.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -146,6 +146,7 @@ struct nbd_fixed_new_option_reply { #define NBD_INFO_NAME 1 #define NBD_INFO_DESCRIPTION 2 #define NBD_INFO_BLOCK_SIZE 3 +#define NBD_INFO_INIT_STATE 4 /* NBD_INFO_EXPORT reply (follows fixed_new_option_reply). */ struct nbd_fixed_new_option_reply_info_export { @@ -154,6 +155,16 @@ struct nbd_fixed_new_option_reply_info_export { uint16_t eflags; /* per-export flags */ } NBD_ATTRIBUTE_PACKED; +/* NBD_INFO_INIT_STATE reply. */ +struct nbd_fixed_new_option_reply_info_init { + uint16_t info; /* NBD_INFO_INIT_STATE */ + uint16_t flags; /* per-export init flags */ +} NBD_ATTRIBUTE_PACKED; + +/* Constants for use in reply to NBD_INFO_INIT_STATE. */ +#define NBD_INIT_SPARSE (1 << 0) +#define NBD_INIT_ZERO (1 << 1) + /* NBD_REP_META_CONTEXT reply (follows fixed_new_option_reply). */ struct nbd_fixed_new_option_reply_meta_context { uint32_t context_id; /* metadata context ID */ -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 02/10] protocol: Wire up backend support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. This patch wires up the backend to send the data, although it won't be until the next couple of patches allow filters and plugins to report through new callbacks that a client finally gets to see new bits set. Signed-off-by: Eric Blake <eblake@redhat.com> --- server/backend.c | 26 +++++++++++++ server/filters.c | 18 ++++++++- server/internal.h | 12 +++++- server/plugins.c | 18 ++++++++- server/protocol-handshake-newstyle.c | 56 ++++++++++++++++++++++++++-- 5 files changed, 123 insertions(+), 7 deletions(-) diff --git a/server/backend.c b/server/backend.c index 208c07b..3ebe1f4 100644 --- a/server/backend.c +++ b/server/backend.c @@ -456,6 +456,32 @@ backend_can_cache (struct backend *b, struct connection *conn) return h->can_cache; } +int +backend_init_sparse (struct backend *b, struct connection *conn) +{ + struct b_conn_handle *h = &conn->handles[b->i]; + + controlpath_debug ("%s: init_sparse", b->name); + + assert (h->handle && (h->state & HANDLE_CONNECTED)); + if (h->init_sparse == -1) + h->init_sparse = b->init_sparse (b, conn, h->handle); + return h->init_sparse; +} + +int +backend_init_zero (struct backend *b, struct connection *conn) +{ + struct b_conn_handle *h = &conn->handles[b->i]; + + controlpath_debug ("%s: init_zero", b->name); + + assert (h->handle && (h->state & HANDLE_CONNECTED)); + if (h->init_zero == -1) + h->init_zero = b->init_zero (b, conn, h->handle); + return h->init_zero; +} + int backend_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, diff --git a/server/filters.c b/server/filters.c index ed026f5..6b9d0b8 100644 --- a/server/filters.c +++ b/server/filters.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -574,6 +574,20 @@ filter_can_cache (struct backend *b, struct connection *conn, void *handle) return backend_can_cache (b->next, conn); } +static int +filter_init_sparse (struct backend *b, struct connection *conn, void *handle) +{ + /* TODO Allow filter to control this. */ + return 0; +} + +static int +filter_init_zero (struct backend *b, struct connection *conn, void *handle) +{ + /* TODO Allow filter to control this. */ + return 0; +} + static int filter_pread (struct backend *b, struct connection *conn, void *handle, void *buf, uint32_t count, uint64_t offset, @@ -705,6 +719,8 @@ static struct backend filter_functions = { .can_fua = filter_can_fua, .can_multi_conn = filter_can_multi_conn, .can_cache = filter_can_cache, + .init_sparse = filter_init_sparse, + .init_zero = filter_init_zero, .pread = filter_pread, .pwrite = filter_pwrite, .flush = filter_flush, diff --git a/server/internal.h b/server/internal.h index a1fa730..435c0b4 100644 --- a/server/internal.h +++ b/server/internal.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -186,6 +186,8 @@ struct b_conn_handle { int can_multi_conn; int can_extents; int can_cache; + int init_sparse; + int init_zero; }; static inline void @@ -204,6 +206,8 @@ reset_b_conn_handle (struct b_conn_handle *h) h->can_multi_conn = -1; h->can_extents = -1; h->can_cache = -1; + h->init_sparse = -1; + h->init_zero = -1; } struct connection { @@ -353,6 +357,8 @@ struct backend { int (*can_multi_conn) (struct backend *, struct connection *conn, void *handle); int (*can_cache) (struct backend *, struct connection *conn, void *handle); + int (*init_sparse) (struct backend *, struct connection *conn, void *handle); + int (*init_zero) (struct backend *, struct connection *conn, void *handle); int (*pread) (struct backend *, struct connection *conn, void *handle, void *buf, uint32_t count, uint64_t offset, @@ -420,6 +426,10 @@ extern int backend_can_multi_conn (struct backend *b, struct connection *conn) __attribute__((__nonnull__ (1, 2))); extern int backend_can_cache (struct backend *b, struct connection *conn) __attribute__((__nonnull__ (1, 2))); +extern int backend_init_sparse (struct backend *b, struct connection *conn) + __attribute__((__nonnull__ (1, 2))); +extern int backend_init_zero (struct backend *b, struct connection *conn) + __attribute__((__nonnull__ (1, 2))); extern int backend_pread (struct backend *b, struct connection *conn, void *buf, uint32_t count, uint64_t offset, diff --git a/server/plugins.c b/server/plugins.c index 79d98b8..9b98cc6 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -432,6 +432,20 @@ plugin_can_cache (struct backend *b, struct connection *conn, void *handle) return NBDKIT_CACHE_NONE; } +static int +plugin_init_sparse (struct backend *b, struct connection *conn, void *handle) +{ + /* TODO Allow plugin to control this. */ + return 0; +} + +static int +plugin_init_zero (struct backend *b, struct connection *conn, void *handle) +{ + /* TODO Allow plugin to control this. */ + return 0; +} + /* Plugins and filters can call this to set the true errno, in cases * where !errno_is_preserved. */ @@ -686,6 +700,8 @@ static struct backend plugin_functions = { .can_fua = plugin_can_fua, .can_multi_conn = plugin_can_multi_conn, .can_cache = plugin_can_cache, + .init_sparse = plugin_init_sparse, + .init_zero = plugin_init_zero, .pread = plugin_pread, .pwrite = plugin_pwrite, .flush = plugin_flush, diff --git a/server/protocol-handshake-newstyle.c b/server/protocol-handshake-newstyle.c index 7179186..5ba358c 100644 --- a/server/protocol-handshake-newstyle.c +++ b/server/protocol-handshake-newstyle.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -138,6 +138,45 @@ send_newstyle_option_reply_info_export (struct connection *conn, return 0; } +static int +send_newstyle_option_reply_info_init_state (struct connection *conn, + uint32_t option, uint32_t reply) +{ + struct nbd_fixed_new_option_reply fixed_new_option_reply; + struct nbd_fixed_new_option_reply_info_init init; + uint16_t flags = 0; + int r; + + r = backend_init_sparse (backend, conn); + if (r == -1) + return r; + if (r) + flags |= NBD_INIT_SPARSE; + + r = backend_init_zero (backend, conn); + if (r == -1) + return r; + if (r) + flags |= NBD_INIT_ZERO; + + fixed_new_option_reply.magic = htobe64 (NBD_REP_MAGIC); + fixed_new_option_reply.option = htobe32 (option); + fixed_new_option_reply.reply = htobe32 (reply); + fixed_new_option_reply.replylen = htobe32 (sizeof init); + init.info = htobe16 (NBD_INFO_INIT_STATE); + init.flags = htobe16 (flags); + + if (conn->send (conn, + &fixed_new_option_reply, + sizeof fixed_new_option_reply, SEND_MORE) == -1 || + conn->send (conn, &init, sizeof init, 0) == -1) { + nbdkit_error ("write: %s: %m", name_of_nbd_opt (option)); + return -1; + } + + return 0; +} + static int send_newstyle_option_reply_meta_context (struct connection *conn, uint32_t option, uint32_t reply, @@ -496,16 +535,25 @@ negotiate_handshake_newstyle_options (struct connection *conn) exportsize) == -1) return -1; + /* The spec states that we may offer gratuitous replies, + * whether or not the client has requested them. Offering + * NBD_INFO_INIT_STATE unconditionally is easy to do. + */ + if (send_newstyle_option_reply_info_init_state (conn, option, + NBD_REP_INFO) == -1) + return -1; + /* For now we ignore all other info requests (but we must - * ignore NBD_INFO_EXPORT if it was requested, because we - * replied already above). Therefore this loop doesn't do - * much at the moment. + * ignore NBD_INFO_EXPORT or NBD_INFO_INIT_STATE if either was + * requested, because we replied already above). Therefore this + * loop doesn't do much at the moment. */ for (i = 0; i < nrinfos; ++i) { memcpy (&info, &data[4 + exportnamelen + 2 + i*2], 2); info = be16toh (info); switch (info) { case NBD_INFO_EXPORT: /* ignore - reply sent above */ break; + case NBD_INFO_INIT_STATE: /* ignore - reply sent above */ break; default: debug ("newstyle negotiation: %s: " "ignoring NBD_INFO_* request %u (%s)", -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 03/10] filters: Wire up filter support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. Most filters just pass through the plugin's result (as it is only checked once when first connecting), but we can do useful things in log (mention the setting), extentlist (answer based on the extents we are advertising), noextents (suppress the advertisement), and truncate (advertise sparse support for our added tail hole). Expand a comment in the cache filter about potential future enchnancements. Initial read-only zero status is easy to add through the extentlist filter (supply a filter list with no entries and the entire image is sparse); supplying startup zero status without overriding the plugin's runtime extents would require either a new filter or a new knob to an existing one. Signed-off-by: Eric Blake <eblake@redhat.com> --- docs/nbdkit-filter.pod | 10 +++++- filters/cache/blk.c | 7 ++-- filters/extentlist/extentlist.c | 28 +++++++++++++++ filters/log/log.c | 9 +++-- filters/log/nbdkit-log-filter.pod | 2 +- filters/noextents/nbdkit-noextents-filter.pod | 11 +++--- filters/noextents/noextents.c | 18 +++++++++- filters/truncate/truncate.c | 15 +++++++- include/nbdkit-filter.h | 8 ++++- server/filters.c | 34 ++++++++++++++++--- 10 files changed, 123 insertions(+), 19 deletions(-) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 55dfab1..0f81684 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -413,6 +413,10 @@ cached value. =head2 C<.can_cache> +=head2 C<.init_sparse> + +=head2 C<.init_zero> + int (*can_write) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); int (*can_flush) (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -434,6 +438,10 @@ cached value. void *handle); int (*can_cache) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*init_sparse) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); + int (*init_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); These intercept the corresponding plugin methods, and control feature bits advertised to the client. @@ -821,4 +829,4 @@ Richard W.M. Jones =head1 COPYRIGHT -Copyright (C) 2013-2019 Red Hat Inc. +Copyright (C) 2013-2020 Red Hat Inc. diff --git a/filters/cache/blk.c b/filters/cache/blk.c index da27318..24d7310 100644 --- a/filters/cache/blk.c +++ b/filters/cache/blk.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -68,7 +68,7 @@ static int fd = -1; * 10 = <unused> * 11 = block cached and dirty * - * Future enhancement: + * Future enhancements: * * We need to cache information about holes, ie. blocks which read as * zeroes but are not explicitly stored in the cache. This @@ -79,6 +79,9 @@ static int fd = -1; * not have to query the plugin a second time (especially useful for * VDDK where querying extents is slow, and for qemu which [in 2019] * repeatedly requests the same information with REQ_ONE set). + * Another potential interaction is when plugin->init_zero returns 1, + * in which case we know any uncached read will see zero up until the + * cache undergoes its first eviction. */ static struct bitmap bm; diff --git a/filters/extentlist/extentlist.c b/filters/extentlist/extentlist.c index 5f4990b..ea0e347 100644 --- a/filters/extentlist/extentlist.c +++ b/filters/extentlist/extentlist.c @@ -271,6 +271,32 @@ extentlist_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, return 1; } +static int +extentlist_init_sparse (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* True if at least one extent is a hole */ + int i; + + for (i = 0; i < nr_extents; i++) + if (extents[i].type & NBDKIT_EXTENT_HOLE) + return 1; + return 0; +} + +static int +extentlist_init_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* True if all extents read as zero */ + int i; + + for (i = 0; i < nr_extents; i++) + if (!(extents[i].type & NBDKIT_EXTENT_ZERO)) + return 0; + return 1; +} + /* Use ‘-D extentlist.lookup=1’ to debug the function below. */ int extentlist_debug_lookup = 0; @@ -320,6 +346,8 @@ static struct nbdkit_filter filter = { .config = extentlist_config, .config_complete = extentlist_config_complete, .can_extents = extentlist_can_extents, + .init_sparse = extentlist_init_sparse, + .init_zero = extentlist_init_zero, .extents = extentlist_extents, }; diff --git a/filters/log/log.c b/filters/log/log.c index 7eb608c..3eeb033 100644 --- a/filters/log/log.c +++ b/filters/log/log.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -262,14 +262,17 @@ log_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, int e = next_ops->can_extents (nxdata); int c = next_ops->can_cache (nxdata); int Z = next_ops->can_fast_zero (nxdata); + int is = next_ops->init_sparse (nxdata); + int iz = next_ops->init_zero (nxdata); if (size < 0 || w < 0 || f < 0 || r < 0 || t < 0 || z < 0 || F < 0 || - e < 0 || c < 0 || Z < 0) + e < 0 || c < 0 || Z < 0 || is < 0 || iz < 0) return -1; output (h, "Connect", 0, "size=0x%" PRIx64 " write=%d flush=%d " "rotational=%d trim=%d zero=%d fua=%d extents=%d cache=%d " - "fast_zero=%d", size, w, f, r, t, z, F, e, c, Z); + "fast_zero=%d init_sparse=%d init_zero=%d", size, w, f, r, t, z, + F, e, c, Z, is, iz); return 0; } diff --git a/filters/log/nbdkit-log-filter.pod b/filters/log/nbdkit-log-filter.pod index b412e7e..2a5f8a2 100644 --- a/filters/log/nbdkit-log-filter.pod +++ b/filters/log/nbdkit-log-filter.pod @@ -59,7 +59,7 @@ on the connection). An example logging session of a client that performs a single successful read is: - 2018-01-27 20:38:22.959984 connection=1 Connect size=0x400 write=1 flush=1 rotational=0 trim=0 zero=1 fua=1 extents=1 cache=0 fast_zero=0 + 2018-01-27 20:38:22.959984 connection=1 Connect size=0x400 write=1 flush=1 rotational=0 trim=0 zero=1 fua=1 extents=1 cache=0 fast_zero=0 init_sparse=0 init_zero=0 2018-01-27 20:38:23.001720 connection=1 Read id=1 offset=0x0 count=0x100 ... 2018-01-27 20:38:23.001995 connection=1 ...Read id=1 return=0 (Success) 2018-01-27 20:38:23.044259 connection=1 Disconnect transactions=1 diff --git a/filters/noextents/nbdkit-noextents-filter.pod b/filters/noextents/nbdkit-noextents-filter.pod index 0260a5c..1c69d6e 100644 --- a/filters/noextents/nbdkit-noextents-filter.pod +++ b/filters/noextents/nbdkit-noextents-filter.pod @@ -12,10 +12,11 @@ nbdkit-noextents-filter - disable extents in the underlying plugin client to detect sparse regions of the underlying disk. C<nbdkit-noextents-filter> disables this so that the plugin appears to be fully allocated, at least to a client that requests structured -replies. It is also possible to use the I<--no-sr> option to nbdkit -to prevent the client from using structured replies, which among its -other side effects will prevent the client from querying extents at -all. +replies. This filter also disables any server advertisement on +whether the image begins life with holes or zero content. It is also +possible to use the I<--no-sr> option to nbdkit to prevent the client +from using structured replies, which among its other side effects will +prevent the client from querying extents at all. This filter can be useful when combined with L<nbdkit-file-plugin(1)> serving a file from a file system known to have poor C<lseek(2)> @@ -60,4 +61,4 @@ Richard W.M. Jones =head1 COPYRIGHT -Copyright (C) 2019 Red Hat Inc. +Copyright (C) 2019-2020 Red Hat Inc. diff --git a/filters/noextents/noextents.c b/filters/noextents/noextents.c index e6ac33b..091f30b 100644 --- a/filters/noextents/noextents.c +++ b/filters/noextents/noextents.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2019 Red Hat Inc. + * Copyright (C) 2019-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -41,10 +41,26 @@ noextents_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } +static int +noextents_init_sparse (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return 0; +} + +static int +noextents_init_zero (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return 0; +} + static struct nbdkit_filter filter = { .name = "noextents", .longname = "nbdkit noextents filter", .can_extents = noextents_can_extents, + .init_sparse = noextents_init_sparse, + .init_zero = noextents_init_zero, }; NBDKIT_REGISTER_FILTER(filter) diff --git a/filters/truncate/truncate.c b/filters/truncate/truncate.c index 8317449..efd1f03 100644 --- a/filters/truncate/truncate.c +++ b/filters/truncate/truncate.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -229,6 +229,18 @@ truncate_can_fast_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return 1; } +/* Override the plugin's .init_sparse, because any tail is sparse. */ +static int +truncate_init_sparse (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + struct handle *h = handle; + + if (h->real_size < h->size) + return 1; + return next_ops->init_sparse (nxdata); +} + /* Read data. */ static int truncate_pread (struct nbdkit_next_ops *next_ops, void *nxdata, @@ -426,6 +438,7 @@ static struct nbdkit_filter filter = { .prepare = truncate_prepare, .get_size = truncate_get_size, .can_fast_zero = truncate_can_fast_zero, + .init_sparse = truncate_init_sparse, .pread = truncate_pread, .pwrite = truncate_pwrite, .trim = truncate_trim, diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index 3500317..d327bf8 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -86,6 +86,8 @@ struct nbdkit_next_ops { int (*can_fua) (void *nxdata); int (*can_multi_conn) (void *nxdata); int (*can_cache) (void *nxdata); + int (*init_sparse) (void *nxdata); + int (*init_zero) (void *nxdata); int (*pread) (void *nxdata, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err); @@ -164,6 +166,10 @@ struct nbdkit_filter { void *handle); int (*can_cache) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle); + int (*init_sparse) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); + int (*init_zero) (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle); int (*pread) (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, void *buf, uint32_t count, uint64_t offset, diff --git a/server/filters.c b/server/filters.c index 6b9d0b8..a654656 100644 --- a/server/filters.c +++ b/server/filters.c @@ -336,6 +336,20 @@ next_can_cache (void *nxdata) return backend_can_cache (b_conn->b, b_conn->conn); } +static int +next_init_sparse (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return backend_init_sparse (b_conn->b, b_conn->conn); +} + +static int +next_init_zero (void *nxdata) +{ + struct b_conn *b_conn = nxdata; + return backend_init_zero (b_conn->b, b_conn->conn); +} + static int next_pread (void *nxdata, void *buf, uint32_t count, uint64_t offset, uint32_t flags, int *err) @@ -407,6 +421,8 @@ static struct nbdkit_next_ops next_ops = { .can_fua = next_can_fua, .can_multi_conn = next_can_multi_conn, .can_cache = next_can_cache, + .init_sparse = next_init_sparse, + .init_zero = next_init_zero, .pread = next_pread, .pwrite = next_pwrite, .flush = next_flush, @@ -577,15 +593,25 @@ filter_can_cache (struct backend *b, struct connection *conn, void *handle) static int filter_init_sparse (struct backend *b, struct connection *conn, void *handle) { - /* TODO Allow filter to control this. */ - return 0; + struct backend_filter *f = container_of (b, struct backend_filter, backend); + struct b_conn nxdata = { .b = b->next, .conn = conn }; + + if (f->filter.init_sparse) + return f->filter.init_sparse (&next_ops, &nxdata, handle); + else + return backend_init_sparse (b->next, conn); } static int filter_init_zero (struct backend *b, struct connection *conn, void *handle) { - /* TODO Allow filter to control this. */ - return 0; + struct backend_filter *f = container_of (b, struct backend_filter, backend); + struct b_conn nxdata = { .b = b->next, .conn = conn }; + + if (f->filter.init_zero) + return f->filter.init_zero (&next_ops, &nxdata, handle); + else + return backend_init_zero (b->next, conn); } static int -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 04/10] plugins: Wire up in-memory plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. For memory-based plugins, it is fairly easy to advertise several cases: data and memory are usually sparse and detecting zero is easy (requires new functions to the sparse_array common code), although since the sparse array is reused between consecutive clients, a later client might get different answers than the first client. The null and full plugins are obviously zero. The zero plugin doesn't return any data, so it doesn't need changes. The info plugin is never sparse, but can be all zeroes in base64exportname mode. Adding initial state support to file-based and language binding plugins will be done separately. Testing of this addition relies on a contemporary patch to libnbd adding a new nbd_get_init_flags() function for reading the advertised initial state, then demonstrating changes in state observable from the memory plugin over successive clients. Signed-off-by: Eric Blake <eblake@redhat.com> --- common/sparse/sparse.c | 30 +++++++++++++++++- common/sparse/sparse.h | 19 +++++++++++- docs/nbdkit-plugin.pod | 23 ++++++++++++++ include/nbdkit-plugin.h | 5 ++- plugins/data/data.c | 21 ++++++++++++- plugins/full/full.c | 18 ++++++++++- plugins/info/info.c | 15 ++++++++- plugins/memory/memory.c | 19 ++++++++++++ plugins/null/null.c | 18 ++++++++++- server/plugins.c | 10 ++++-- tests/Makefile.am | 2 ++ tests/test-memory-init.sh | 65 +++++++++++++++++++++++++++++++++++++++ 12 files changed, 236 insertions(+), 9 deletions(-) create mode 100755 tests/test-memory-init.sh diff --git a/common/sparse/sparse.c b/common/sparse/sparse.c index 0acfa1f..ed593f1 100644 --- a/common/sparse/sparse.c +++ b/common/sparse/sparse.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2019 Red Hat Inc. + * Copyright (C) 2017-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -44,6 +44,7 @@ #include <nbdkit-plugin.h> #include "iszero.h" +#include "rounding.h" #include "sparse.h" /* Two level directory for the sparse array. @@ -67,6 +68,8 @@ * images, plus some architectures have much larger page sizes than * others making behaviour inconsistent across arches. * + * 4. We can easily track how much of the image is allocated. + * * To achieve this we use a B-Tree-like structure. The L1 directory * contains an ordered, non-overlapping, non-contiguous list of * (offset, pointer to L2 directory). @@ -103,6 +106,8 @@ struct l1_entry { struct sparse_array { struct l1_entry *l1_dir; /* L1 directory. */ size_t l1_size; /* Number of entries in L1 directory. */ + size_t used_pages; /* Number of non-NULL L2 entries. */ + uint64_t max_pages; /* Maximum L2 pages if fully allocated. */ bool debug; }; @@ -140,6 +145,8 @@ alloc_sparse_array (bool debug) return NULL; sa->l1_dir = NULL; sa->l1_size = 0; + sa->used_pages = 0; + sa->max_pages = 0; sa->debug = debug; return sa; } @@ -254,6 +261,7 @@ lookup (struct sparse_array *sa, uint64_t offset, bool create, return NULL; } l2_dir[o] = page; + sa->used_pages++; } if (!page) return NULL; @@ -355,6 +363,7 @@ sparse_array_zero (struct sparse_array *sa, uint32_t count, uint64_t offset) __func__, offset); free (*l2_page); *l2_page = NULL; + sa->used_pages--; } } @@ -398,3 +407,22 @@ sparse_array_extents (struct sparse_array *sa, return 0; } + +void +sparse_array_set_size (struct sparse_array *sa, uint64_t size) +{ + assert (size <= INT64_MAX); + sa->max_pages = DIV_ROUND_UP (size, PAGE_SIZE); +} + +int +sparse_array_is_sparse (struct sparse_array *sa) +{ + return sa->used_pages < sa->max_pages; +} + +int +sparse_array_is_zero (struct sparse_array *sa) +{ + return !sa->used_pages; +} diff --git a/common/sparse/sparse.h b/common/sparse/sparse.h index 704ba32..6234ffe 100644 --- a/common/sparse/sparse.h +++ b/common/sparse/sparse.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2019 Red Hat Inc. + * Copyright (C) 2017-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -91,4 +91,21 @@ extern int sparse_array_extents (struct sparse_array *sa, uint32_t count, uint64_t offset, struct nbdkit_extents *extents); +/* Set the expected maximum size of the sparse array. + * + * May be called after writing initial contents, as long as those contents + * were not placed beyond the new expected size. + */ +extern void sparse_array_set_size (struct sparse_array *sa, uint64_t size); + +/* Return true if the array is sparse. + * + * The results are reliable only after a call to sparse_array_set_size. + */ +extern int sparse_array_is_sparse (struct sparse_array *sa); + +/* Return true if the array reads as all zeroes. */ +extern int sparse_array_is_zero (struct sparse_array *sa); + + #endif /* NBDKIT_SPARSE_H */ diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod index 41bffb7..d55aafd 100644 --- a/docs/nbdkit-plugin.pod +++ b/docs/nbdkit-plugin.pod @@ -712,6 +712,29 @@ This callback is not required. If omitted, then we return C<NBDKIT_CACHE_NONE> if the C<.cache> callback is missing, or C<NBDKIT_CACHE_NATIVE> if it is defined. +=head2 C<.init_sparse> + + int init_sparse (void *handle); + +This is called during the option negotiation phase to find out if the +plugin knows whether the image starts out as sparse (that is, at least +one unallocated hole, regardless of what might be read from that hole). + +This callback is not required. If omitted, then we return C<0>. + +=head2 C<.init_zero> + + int init_zero (void *handle); + +This is called during the option negotiation phase to find out if the +plugin knows whether the image starts out reading entirely as zero +(regardless of whether the image is sparse). This bit of information +is most useful to a client that plans to copy an image from elsewhere +over to a just-created file exposed by the server, because the client +can decide if it can bypass a potentially lengthy pre-zeroing pass. + +This callback is not required. If omitted, then we return C<0>. + =head2 C<.pread> int pread (void *handle, void *buf, uint32_t count, uint64_t offset, diff --git a/include/nbdkit-plugin.h b/include/nbdkit-plugin.h index b4ecf65..77bd23e 100644 --- a/include/nbdkit-plugin.h +++ b/include/nbdkit-plugin.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -136,6 +136,9 @@ struct nbdkit_plugin { int (*can_fast_zero) (void *handle); int (*preconnect) (int readonly); + + int (*init_sparse) (void *handle); + int (*init_zero) (void *handle); }; extern void nbdkit_set_error (int err); diff --git a/plugins/data/data.c b/plugins/data/data.c index 1e5cb0f..ff8b272 100644 --- a/plugins/data/data.c +++ b/plugins/data/data.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -314,6 +314,7 @@ data_config_complete (void) if (size == -1) size = data_size; nbdkit_debug ("final size: %" PRIi64, size); + sparse_array_set_size (sa, size); return 0; } @@ -378,6 +379,22 @@ data_can_fast_zero (void *handle) return 1; } +/* Does current client start with a sparse image. */ +static int +data_init_sparse (void *handle) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + return sparse_array_is_sparse (sa); +} + +/* Does current client start with all zeroes. */ +static int +data_init_zero (void *handle) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + return sparse_array_is_zero (sa); +} + /* Read data. */ static int data_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -455,6 +472,8 @@ static struct nbdkit_plugin plugin = { .can_fua = data_can_fua, .can_cache = data_can_cache, .can_fast_zero = data_can_fast_zero, + .init_sparse = data_init_sparse, + .init_zero = data_init_zero, .pread = data_pread, .pwrite = data_pwrite, .zero = data_zero, diff --git a/plugins/full/full.c b/plugins/full/full.c index 0b69a8c..9fb8d85 100644 --- a/plugins/full/full.c +++ b/plugins/full/full.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2019 Red Hat Inc. + * Copyright (C) 2017-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -111,6 +111,20 @@ full_can_cache (void *handle) return NBDKIT_CACHE_NATIVE; } +/* Always sparse. */ +static int +full_init_sparse (void *handle) +{ + return 1; +} + +/* Always reads as zero. */ +static int +full_init_zero (void *handle) +{ + return 1; +} + /* Read data. */ static int full_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -167,6 +181,8 @@ static struct nbdkit_plugin plugin = { .get_size = full_get_size, .can_multi_conn = full_can_multi_conn, .can_cache = full_can_cache, + .init_sparse = full_init_sparse, + .init_zero = full_init_zero, .pread = full_pread, .pwrite = full_pwrite, .trim = full_trim, diff --git a/plugins/info/info.c b/plugins/info/info.c index 329a368..29bff77 100644 --- a/plugins/info/info.c +++ b/plugins/info/info.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2019 Red Hat Inc. + * Copyright (C) 2017-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -50,6 +50,7 @@ #include <nbdkit-plugin.h> #include "byte-swapping.h" +#include "iszero.h" #include "tvdiff.h" /* The mode. */ @@ -380,6 +381,17 @@ info_can_cache (void *handle) return NBDKIT_CACHE_NATIVE; } +/* Initially zero in some cases. */ +static int +info_init_zero (void *handle) +{ + struct handle *h = handle; + + if (mode == MODE_TIME || mode == MODE_UPTIME || mode == MODE_CONNTIME) + return 0; + return is_zero(h->data, h->len); +} + static void update_time (struct handle *h) { @@ -444,6 +456,7 @@ static struct nbdkit_plugin plugin = { .get_size = info_get_size, .can_multi_conn = info_can_multi_conn, .can_cache = info_can_cache, + .init_zero = info_init_zero, .pread = info_pread, /* In this plugin, errno is preserved properly along error return * paths from failed system calls. diff --git a/plugins/memory/memory.c b/plugins/memory/memory.c index a96252b..bfc1249 100644 --- a/plugins/memory/memory.c +++ b/plugins/memory/memory.c @@ -101,6 +101,7 @@ memory_config_complete (void) nbdkit_error ("you must specify size=<SIZE> on the command line"); return -1; } + sparse_array_set_size (sa, size); return 0; } @@ -154,6 +155,22 @@ memory_can_fast_zero (void *handle) return 1; } +/* Does current client start with a sparse image. */ +static int +memory_init_sparse (void *handle) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + return sparse_array_is_sparse (sa); +} + +/* Does current client start with all zeroes. */ +static int +memory_init_zero (void *handle) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + return sparse_array_is_zero (sa); +} + /* Read data. */ static int memory_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -231,6 +248,8 @@ static struct nbdkit_plugin plugin = { .can_multi_conn = memory_can_multi_conn, .can_cache = memory_can_cache, .can_fast_zero = memory_can_fast_zero, + .init_sparse = memory_init_sparse, + .init_zero = memory_init_zero, .pread = memory_pread, .pwrite = memory_pwrite, .zero = memory_zero, diff --git a/plugins/null/null.c b/plugins/null/null.c index 559cb81..bea908c 100644 --- a/plugins/null/null.c +++ b/plugins/null/null.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2019 Red Hat Inc. + * Copyright (C) 2017-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -107,6 +107,20 @@ null_can_fast_zero (void *handle) return 1; } +/* Always sparse. */ +static int +null_init_sparse (void *handle) +{ + return 1; +} + +/* Always reads as zero. */ +static int +null_init_zero (void *handle) +{ + return 1; +} + /* Read data. */ static int null_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -175,6 +189,8 @@ static struct nbdkit_plugin plugin = { .can_multi_conn = null_can_multi_conn, .can_cache = null_can_cache, .can_fast_zero = null_can_fast_zero, + .init_sparse = null_init_sparse, + .init_zero = null_init_zero, .pread = null_pread, .pwrite = null_pwrite, .zero = null_zero, diff --git a/server/plugins.c b/server/plugins.c index 9b98cc6..040dc71 100644 --- a/server/plugins.c +++ b/server/plugins.c @@ -435,14 +435,20 @@ plugin_can_cache (struct backend *b, struct connection *conn, void *handle) static int plugin_init_sparse (struct backend *b, struct connection *conn, void *handle) { - /* TODO Allow plugin to control this. */ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + if (p->plugin.init_sparse) + return p->plugin.init_sparse (handle); return 0; } static int plugin_init_zero (struct backend *b, struct connection *conn, void *handle) { - /* TODO Allow plugin to control this. */ + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); + + if (p->plugin.init_zero) + return p->plugin.init_zero (handle); return 0; } diff --git a/tests/Makefile.am b/tests/Makefile.am index ea6b147..4e036a3 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -134,6 +134,7 @@ EXTRA_DIST = \ test-log.sh \ test-long-name.sh \ test.lua \ + test-memory-init.sh \ test-memory-largest.sh \ test-memory-largest-for-qemu.sh \ test-nbd-extents.sh \ @@ -600,6 +601,7 @@ TESTS += \ # memory plugin test. LIBGUESTFS_TESTS += test-memory TESTS += test-memory-largest.sh test-memory-largest-for-qemu.sh +TESTS += test-memory-init.sh test_memory_SOURCES = test-memory.c test.h test_memory_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) diff --git a/tests/test-memory-init.sh b/tests/test-memory-init.sh new file mode 100755 index 0000000..11a526a --- /dev/null +++ b/tests/test-memory-init.sh @@ -0,0 +1,65 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2020 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Test the initial state of the memory plugin. + +source ./functions.sh +set -e + +requires nbdsh -c 'exit (not hasattr (h, "get_init_flags"))' + +sock=`mktemp -u` +files="memory-init.pid $sock" +rm -f $files +cleanup_fn rm -f $files + +# Run nbdkit with the memory plugin. +start_nbdkit -P memory-init.pid -U $sock memory 1M + +# The image should start as sparse/zero, before we write to it +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == nbd.INIT_SPARSE | nbd.INIT_ZERO)' \ + -c 'h.pwrite ("hello world".encode (), 0)' + +# The image is still sparse, but no longer zero, before we fill it +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == nbd.INIT_SPARSE)' \ + -c 'h.pwrite (("1" * (1024*1024)).encode (), 0)' + +# The image is neither sparse nor zero, before we wipe it +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == 0)' \ + -c 'h.zero (1024 * 1024, 0)' + +# Once again, the image is sparse/zero +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == nbd.INIT_SPARSE | nbd.INIT_ZERO)' -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 05/10] plugins: Wire up file-based plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. For file-based plugins, we are already probing lseek(SEEK_HOLE) to learn if extents are supported; a slight tweak to remember if that result is EOF tells us if we are sparse, and a similar lseek(SEEK_DATA) returning ENXIO tells us if the entire file is a hole, with at most two lseek calls during open (rather than one lseek for each of .can_extents, .init_sparse, and .init_zero). The split plugin is similar to file. For partitioning and linuxdisk, we know we are exposing a sparse image. For other file-based plugins, like ext2, we do not yet expose .extents so it is not worth trying to expose .init_sparse or .init_zero. A successful test depends on whether the current file system creates sparse files, but it was easy enough to ensure the test skips rather than fails when that is not the case. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/file/file.c | 82 ++++++++++++++++++++++++----- plugins/linuxdisk/linuxdisk.c | 13 ++++- plugins/partitioning/partitioning.c | 13 ++++- plugins/split/split.c | 59 +++++++++++++++++++-- tests/Makefile.am | 3 +- tests/test-file-init.sh | 69 ++++++++++++++++++++++++ 6 files changed, 219 insertions(+), 20 deletions(-) create mode 100755 tests/test-file-init.sh diff --git a/plugins/file/file.c b/plugins/file/file.c index 8c2ea07..ca7e2d9 100644 --- a/plugins/file/file.c +++ b/plugins/file/file.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2019 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -154,6 +154,9 @@ struct handle { bool can_zero_range; bool can_fallocate; bool can_zeroout; + bool can_extents; + bool init_sparse; + bool init_zero; }; /* Create the per-connection handle. */ @@ -214,6 +217,52 @@ file_open (int readonly) h->can_fallocate = true; h->can_zeroout = h->is_block_device; + h->can_extents = false; + h->init_sparse = false; + h->init_zero = false; +#ifdef SEEK_HOLE + if (!h->is_block_device) { + off_t r; + + /* A simple test to see whether SEEK_DATA/SEEK_HOLE are likely to work on + * the current filesystem, and to see if the image is sparse or zero. + */ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lseek_lock); + r = lseek (h->fd, 0, SEEK_DATA); + if (r == -1) { + if (errno == ENXIO) { + nbdkit_debug ("extents enabled, entire image is hole"); + h->can_extents = true; + h->init_sparse = true; + h->init_zero = true; + } else { + nbdkit_debug ("extents disabled: lseek(SEEK_DATA): %m"); + } + } + else { + h->can_extents = true; + if (r > 0) { + nbdkit_debug ("extents enabled, image includes hole before data"); + h->init_sparse = true; + } + else { + r = lseek (h->fd, 0, SEEK_HOLE); + if (r == -1) { + nbdkit_debug ("extents disabled: lseek(SEEK_HOLE): %m"); + h->can_extents = false; + } + else if (r == statbuf.st_size) { + nbdkit_debug ("extents enabled, image currently all data"); + } + else { + nbdkit_debug ("extents enabled, image includes data before hole"); + h->init_sparse = true; + } + } + } + } +#endif + return h; } @@ -540,18 +589,8 @@ static int file_can_extents (void *handle) { struct handle *h = handle; - off_t r; - /* A simple test to see whether SEEK_HOLE etc is likely to work on - * the current filesystem. - */ - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lseek_lock); - r = lseek (h->fd, 0, SEEK_HOLE); - if (r == -1) { - nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m"); - return 0; - } - return 1; + return h->can_extents; } static int @@ -622,6 +661,23 @@ file_extents (void *handle, uint32_t count, uint64_t offset, ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lseek_lock); return do_extents (handle, count, offset, flags, extents); } + +/* Initial state. */ +static int +file_init_sparse (void *handle) +{ + struct handle *h = handle; + + return h->init_sparse; +} + +static int +file_init_zero (void *handle) +{ + struct handle *h = handle; + + return h->init_zero; +} #endif /* SEEK_HOLE */ #if HAVE_POSIX_FADVISE @@ -668,6 +724,8 @@ static struct nbdkit_plugin plugin = { #ifdef SEEK_HOLE .can_extents = file_can_extents, .extents = file_extents, + .init_sparse = file_init_sparse, + .init_zero = file_init_zero, #endif #if HAVE_POSIX_FADVISE .cache = file_cache, diff --git a/plugins/linuxdisk/linuxdisk.c b/plugins/linuxdisk/linuxdisk.c index 99dbc99..0907302 100644 --- a/plugins/linuxdisk/linuxdisk.c +++ b/plugins/linuxdisk/linuxdisk.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2019 Red Hat Inc. + * Copyright (C) 2019-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -167,6 +167,16 @@ linuxdisk_can_cache (void *handle) return NBDKIT_CACHE_EMULATE; } +/* Initial state. */ +static int +linuxdisk_init_sparse (void *handle) +{ + /* region_zero regions mean we are sparse, even if we don't yet + * support .extents + */ + return 1; +} + /* Read data from the virtual disk. */ static int linuxdisk_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -230,6 +240,7 @@ static struct nbdkit_plugin plugin = { .get_size = linuxdisk_get_size, .can_multi_conn = linuxdisk_can_multi_conn, .can_cache = linuxdisk_can_cache, + .init_sparse = linuxdisk_init_sparse, .pread = linuxdisk_pread, .errno_is_preserved = 1, }; diff --git a/plugins/partitioning/partitioning.c b/plugins/partitioning/partitioning.c index 6e426b9..0b42b2e 100644 --- a/plugins/partitioning/partitioning.c +++ b/plugins/partitioning/partitioning.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -307,6 +307,16 @@ partitioning_can_cache (void *handle) return NBDKIT_CACHE_EMULATE; } +/* Initial state. */ +static int +partitioning_init_sparse (void *handle) +{ + /* region_zero regions mean we are sparse, even if we don't yet + * support .extents + */ + return 1; +} + /* Read data. */ static int partitioning_pread (void *handle, void *buf, uint32_t count, uint64_t offset) @@ -437,6 +447,7 @@ static struct nbdkit_plugin plugin = { .get_size = partitioning_get_size, .can_multi_conn = partitioning_can_multi_conn, .can_cache = partitioning_can_cache, + .init_sparse = partitioning_init_sparse, .pread = partitioning_pread, .pwrite = partitioning_pwrite, .flush = partitioning_flush, diff --git a/plugins/split/split.c b/plugins/split/split.c index 70fd4d4..ebb5f5a 100644 --- a/plugins/split/split.c +++ b/plugins/split/split.c @@ -97,6 +97,8 @@ split_config (const char *key, const char *value) struct handle { struct file *files; uint64_t size; /* Total concatenated size. */ + bool init_sparse; + bool init_zero; }; struct file { @@ -147,6 +149,8 @@ split_open (int readonly) } offset = 0; + h->init_sparse = false; /* set later as needed */ + h->init_zero = true; /* cleared later as needed */ for (i = 0; i < nr_files; ++i) { h->files[i].offset = offset; @@ -161,17 +165,44 @@ split_open (int readonly) i, filenames[i], h->files[i].offset, h->files[i].size); #ifdef SEEK_HOLE - /* Test if this file supports extents. */ + /* Test if this file supports extents, and for initial state. */ ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lseek_lock); r = lseek (h->files[i].fd, 0, SEEK_DATA); - if (r == -1 && errno != ENXIO) { - nbdkit_debug ("disabling extents: lseek on %s: %m", filenames[i]); - h->files[i].can_extents = false; + if (r == -1) { + if (errno == ENXIO) { + /* Entire file is hole */ + h->files[i].can_extents = true; + h->init_sparse = true; + } + else { + nbdkit_debug ("disabling extents: lseek(SEEK_DATA) on %s: %m", + filenames[i]); + h->files[i].can_extents = false; + h->init_zero = false; + } } - else + else { h->files[i].can_extents = true; + h->init_zero = false; + if (r > 0) + /* File includes hole before data */ + h->init_sparse = true; + else { + r = lseek (h->files[i].fd, 0, SEEK_HOLE); + if (r == -1) { + nbdkit_debug ("disabling extents: lseek(SEEK_HOLE) on %s: %m", + filenames[i]); + h->files[i].can_extents = false; + } + else if (r < statbuf.st_size) { + /* File includes data before hole */ + h->init_sparse = true; + } + } + } #else h->files[i].can_extents = false; + h->init_zero = false; #endif } h->size = offset; @@ -227,6 +258,22 @@ split_can_cache (void *handle) #endif } +static int +split_init_sparse (void *handle) +{ + struct handle *h = handle; + + return h->init_sparse; +} + +static int +split_init_zero (void *handle) +{ + struct handle *h = handle; + + return h->init_zero; +} + /* Helper function to map the offset to the correct file. */ static int compare_offset (const void *offsetp, const void *filep) @@ -450,6 +497,8 @@ static struct nbdkit_plugin plugin = { .close = split_close, .get_size = split_get_size, .can_cache = split_can_cache, + .init_sparse = split_init_sparse, + .init_zero = split_init_zero, .pread = split_pread, .pwrite = split_pwrite, #if HAVE_POSIX_FADVISE diff --git a/tests/Makefile.am b/tests/Makefile.am index 4e036a3..a8ff601 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -113,6 +113,7 @@ EXTRA_DIST = \ test-export-name.sh \ test-extentlist.sh \ test-file-extents.sh \ + test-file-init.sh \ test-floppy.sh \ test-foreground.sh \ test-fua.sh \ @@ -554,7 +555,7 @@ test_file_block_SOURCES = test-file-block.c test.h test_file_block_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) test_file_block_LDADD = libtest.la $(LIBGUESTFS_LIBS) -TESTS += test-file-extents.sh +TESTS += test-file-extents.sh test-file-init.sh # floppy plugin test. TESTS += test-floppy.sh diff --git a/tests/test-file-init.sh b/tests/test-file-init.sh new file mode 100755 index 0000000..c66ac65 --- /dev/null +++ b/tests/test-file-init.sh @@ -0,0 +1,69 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2020 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Test the initial state of the file plugin. + +source ./functions.sh +set -e + +requires nbdsh -c 'exit (not hasattr (h, "get_init_flags"))' +requires truncate --help +requires stat --help + +sock=`mktemp -u` +files="file-init.pid $sock file-init.out" +rm -f $files +cleanup_fn rm -f $files + +# See if the current file system creates sparse files +truncate --size=1M file-init.out +if test "$(stat -c %b file-init.out)" != 0; then + echo "$0: unable to create sparse file, skipping this test" + exit 77 +fi + +# Run nbdkit with the file plugin. +start_nbdkit -P file-init.pid -U $sock file file-init.out + +# The image should start as sparse/zero, before we write to it +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == nbd.INIT_SPARSE | nbd.INIT_ZERO)' \ + -c 'h.pwrite ("hello world".encode (), 0)' + +# The image is still sparse, but no longer zero, before we fill it +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == nbd.INIT_SPARSE)' \ + -c 'h.pwrite (("1" * (1024*1024)).encode (), 0)' + +# The image is neither sparse nor zero +nbdsh --connect "nbd+unix://?socket=$sock" \ + -c 'assert (h.get_init_flags () == 0)' -- 2.24.1
Eric Blake
2020-Feb-10 21:43 UTC
[Libguestfs] [nbdkit PATCH 06/10] plugins: Wire up shell plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. For shell-based plugins (sh and eval), it's relatively straightforward to expose the two new functions. Likewise, testing with the eval plugin is rather easy. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/eval/eval.c | 4 +++ plugins/eval/nbdkit-eval-plugin.pod | 9 +++-- plugins/sh/example.sh | 14 ++++++++ plugins/sh/methods.c | 18 +++++++++- plugins/sh/methods.h | 4 ++- plugins/sh/nbdkit-sh-plugin.pod | 8 ++++- plugins/sh/sh.c | 4 ++- tests/Makefile.am | 3 +- tests/test-eval-init.sh | 54 +++++++++++++++++++++++++++++ 9 files changed, 111 insertions(+), 7 deletions(-) create mode 100755 tests/test-eval-init.sh diff --git a/plugins/eval/eval.c b/plugins/eval/eval.c index 094cac5..0c26628 100644 --- a/plugins/eval/eval.c +++ b/plugins/eval/eval.c @@ -71,6 +71,8 @@ static const char *known_methods[] = { "extents", "flush", "get_size", + "init_sparse", + "init_zero", "is_rotational", "missing", "open", @@ -419,6 +421,8 @@ static struct nbdkit_plugin plugin = { .can_multi_conn = sh_can_multi_conn, .can_cache = sh_can_cache, .can_fast_zero = sh_can_fast_zero, + .init_sparse = sh_init_sparse, + .init_zero = sh_init_zero, .pread = sh_pread, .pwrite = sh_pwrite, diff --git a/plugins/eval/nbdkit-eval-plugin.pod b/plugins/eval/nbdkit-eval-plugin.pod index cbb4133..48937de 100644 --- a/plugins/eval/nbdkit-eval-plugin.pod +++ b/plugins/eval/nbdkit-eval-plugin.pod @@ -51,7 +51,8 @@ all other methods are identical). Create a 64M read-only disk of zeroes: nbdkit eval get_size=' echo 64M ' \ - pread=' dd if=/dev/zero count=$3 iflag=count_bytes ' + pread=' dd if=/dev/zero count=$3 iflag=count_bytes ' \ + init_zero=' exit 0 ' The following command is the eval plugin equivalent of L<nbdkit-file-plugin(1)> (except not as fast and missing many @@ -102,6 +103,10 @@ features): =item B<get_size=>SCRIPT +=item B<init_sparse=>SCRIPT + +=item B<init_zero=>SCRIPT + =item B<is_rotational=>SCRIPT =item B<open=>SCRIPT @@ -191,4 +196,4 @@ Richard W.M. Jones =head1 COPYRIGHT -Copyright (C) 2019 Red Hat Inc. +Copyright (C) 2019-2020 Red Hat Inc. diff --git a/plugins/sh/example.sh b/plugins/sh/example.sh index 99e4e89..638fea0 100755 --- a/plugins/sh/example.sh +++ b/plugins/sh/example.sh @@ -184,6 +184,20 @@ case "$1" in echo "file" ;; + init_sparse) + # Return true if file is known to be sparse at client open. + # This heuristic has false negatives, but gets many cases right. + test 1 == $(( $(stat -L -c '%b * %B < %s' $f) )) || exit 3 + ;; + + init_zero) + # Return true if file has all zero content at client open. + # A slower 'cmp $f /dev/zero' might catch more cases. + # Be careful: blindly returning true may break a second client + # if a first client modifies the file. + test 0 == $(stat -L -c '%b' $f) || exit 3 + ;; + *) # Unknown methods must exit with code 2. exit 2 diff --git a/plugins/sh/methods.c b/plugins/sh/methods.c index ccedc6c..28112eb 100644 --- a/plugins/sh/methods.c +++ b/plugins/sh/methods.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -568,6 +568,22 @@ sh_can_fast_zero (void *handle) return !r; } +int +sh_init_sparse (void *handle) +{ + const char *method = "init_sparse"; + const char *script = get_script (method); + return boolean_method (script, method, handle, 0); +} + +int +sh_init_zero (void *handle) +{ + const char *method = "init_zero"; + const char *script = get_script (method); + return boolean_method (script, method, handle, 0); +} + int sh_flush (void *handle, uint32_t flags) { diff --git a/plugins/sh/methods.h b/plugins/sh/methods.h index c5da214..31a8124 100644 --- a/plugins/sh/methods.h +++ b/plugins/sh/methods.h @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -58,6 +58,8 @@ extern int sh_can_fua (void *handle); extern int sh_can_multi_conn (void *handle); extern int sh_can_cache (void *handle); extern int sh_can_fast_zero (void *handle); +extern int sh_init_sparse (void *handle); +extern int sh_init_zero (void *handle); extern int sh_flush (void *handle, uint32_t flags); extern int sh_trim (void *handle, uint32_t count, uint64_t offset, uint32_t flags); diff --git a/plugins/sh/nbdkit-sh-plugin.pod b/plugins/sh/nbdkit-sh-plugin.pod index 0c4f2f8..913a6b1 100644 --- a/plugins/sh/nbdkit-sh-plugin.pod +++ b/plugins/sh/nbdkit-sh-plugin.pod @@ -307,8 +307,14 @@ The script should exit with code C<0> for true or code C<3> for false. =item C<can_fast_zero> +=item C<init_sparse> + +=item C<init_zero> + /path/to/script is_rotational <handle> /path/to/script can_fast_zero <handle> + /path/to/script init_sparse <handle> + /path/to/script init_zero <handle> The script should exit with code C<0> for true or code C<3> for false. @@ -467,4 +473,4 @@ Richard W.M. Jones =head1 COPYRIGHT -Copyright (C) 2018-2019 Red Hat Inc. +Copyright (C) 2018-2020 Red Hat Inc. diff --git a/plugins/sh/sh.c b/plugins/sh/sh.c index eddba8e..1aea290 100644 --- a/plugins/sh/sh.c +++ b/plugins/sh/sh.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2018-2019 Red Hat Inc. + * Copyright (C) 2018-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -325,6 +325,8 @@ static struct nbdkit_plugin plugin = { .can_multi_conn = sh_can_multi_conn, .can_cache = sh_can_cache, .can_fast_zero = sh_can_fast_zero, + .init_sparse = sh_init_sparse, + .init_zero = sh_init_zero, .pread = sh_pread, .pwrite = sh_pwrite, diff --git a/tests/Makefile.am b/tests/Makefile.am index a8ff601..ca659d7 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -110,6 +110,7 @@ EXTRA_DIST = \ test-error100.sh \ test-error-triggered.sh \ test-eval.sh \ + test-eval-init.sh \ test-export-name.sh \ test-extentlist.sh \ test-file-extents.sh \ @@ -515,7 +516,7 @@ test_data_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) test_data_LDADD = libtest.la $(LIBGUESTFS_LIBS) # eval plugin test. -TESTS += test-eval.sh +TESTS += test-eval.sh test-eval-init.sh # ext2 plugin test. if HAVE_EXT2 diff --git a/tests/test-eval-init.sh b/tests/test-eval-init.sh new file mode 100755 index 0000000..b72b1fc --- /dev/null +++ b/tests/test-eval-init.sh @@ -0,0 +1,54 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2020 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Test the initial state of the eval plugin. + +source ./functions.sh +set -e + +requires nbdsh -c 'exit (not hasattr (h, "get_init_flags"))' + +# All four combinations of initial flags. Relies on values of NBD protocol. +for val in 0 1 2 3; do + nbdkit -v -U - eval \ + get_size='echo 1024' \ + pread='dd if=/dev/zero count=$3 iflag=count_bytes' \ + init_sparse="exit $((3 * !($val & 1)))" \ + init_zero="exit $((3 * !($val & 2)))" \ + --run 'nbdsh --uri $uri -c "assert (h.get_init_flags () == '$val')"' +done + +# Omitting is the same as returning false (the safe default). +nbdkit -v -U - eval \ + get_size='echo 1024' \ + pread='dd if=/dev/zero count=$3 iflag=count_bytes' \ + --run 'nbdsh --uri $uri -c "assert (h.get_init_flags () == 0)"' -- 2.24.1
Eric Blake
2020-Feb-10 21:44 UTC
[Libguestfs] [nbdkit PATCH 07/10] plugins: Wire up python plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. For python, the hardest part was figuring out how to conditionally test things depending on new-enough libnbd (and that wasn't hard). Note that the python bindings do not yet support extents, so this is one case where the new protocol addition is definitely useful when dealing with an initially-empty destination. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/python/nbdkit-python-plugin.pod | 14 ++++++++++++++ plugins/python/python.c | 16 +++++++++++++++- tests/test-python-plugin.py | 8 +++++++- tests/test_python.py | 22 +++++++++++++++++++++- 4 files changed, 57 insertions(+), 3 deletions(-) diff --git a/plugins/python/nbdkit-python-plugin.pod b/plugins/python/nbdkit-python-plugin.pod index 4065ec7..c2d1257 100644 --- a/plugins/python/nbdkit-python-plugin.pod +++ b/plugins/python/nbdkit-python-plugin.pod @@ -245,6 +245,20 @@ contents will be garbage collected. # return nbdkit.CACHE_NONE or nbdkit.CACHE_EMULATE # or nbdkit.CACHE_NATIVE +=item C<init_sparse> + +(Optional) + + def init_sparse(h): + # return a boolean + +=item C<init_zero> + +(Optional) + + def init_zero(h): + # return a boolean + =item C<pread> (Required) diff --git a/plugins/python/python.c b/plugins/python/python.c index 5e2e526..5a4a702 100644 --- a/plugins/python/python.c +++ b/plugins/python/python.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013-2018 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -871,6 +871,18 @@ py_can_cache (void *handle) return NBDKIT_CACHE_NONE; } +static int +py_init_sparse (void *handle) +{ + return boolean_callback (handle, "init_sparse", NULL); +} + +static int +py_init_zero (void *handle) +{ + return boolean_callback (handle, "init_zero", NULL); +} + #define py_config_help \ "script=<FILENAME> (required) The Python plugin to run.\n" \ "[other arguments may be used by the plugin that you load]" @@ -902,6 +914,8 @@ static struct nbdkit_plugin plugin = { .can_fast_zero = py_can_fast_zero, .can_fua = py_can_fua, .can_cache = py_can_cache, + .init_sparse = py_init_sparse, + .init_zero = py_init_zero, .pread = py_pread, .pwrite = py_pwrite, diff --git a/tests/test-python-plugin.py b/tests/test-python-plugin.py index 8e90bc2..0f3faaf 100644 --- a/tests/test-python-plugin.py +++ b/tests/test-python-plugin.py @@ -1,5 +1,5 @@ # nbdkit test plugin -# Copyright (C) 2019 Red Hat Inc. +# Copyright (C) 2019-2020 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -95,6 +95,12 @@ def can_cache (h): elif cache == "native": return nbdkit.CACHE_NATIVE +def init_sparse (h): + return cfg.get ('init_sparse', False) + +def init_zero (h): + return cfg.get ('init_zero', False) + def pread (h, buf, offset, flags): assert flags == 0 end = offset + len(buf) diff --git a/tests/test_python.py b/tests/test_python.py index 6b9f297..e19b895 100755 --- a/tests/test_python.py +++ b/tests/test_python.py @@ -1,6 +1,6 @@ #!/usr/bin/env python3 # nbdkit -# Copyright (C) 2019 Red Hat Inc. +# Copyright (C) 2019-2020 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -155,6 +155,26 @@ class Test (unittest.TestCase): # Not yet implemented: can_extents. + def test_init_sparse_false (self): + self.connect ({"size": 512, "init_sparse": False}) + if hasattr (self.h, "get_init_flags"): + assert not self.h.get_init_flags () & nbd.INIT_SPARSE + + def test_init_sparse_true (self): + self.connect ({"size": 512, "init_sparse": True}) + if hasattr (self.h, "get_init_flags"): + assert self.h.get_init_flags () & nbd.INIT_SPARSE + + def test_init_zero_false (self): + self.connect ({"size": 512, "init_zero": False}) + if hasattr (self.h, "get_init_flags"): + assert not self.h.get_init_flags () & nbd.INIT_ZERO + + def test_init_zero_true (self): + self.connect ({"size": 512, "init_zero": True}) + if hasattr (self.h, "get_init_flags"): + assert self.h.get_init_flags () & nbd.INIT_ZERO + def test_pread (self): """Test pread.""" self.connect ({"size": 512}) -- 2.24.1
Eric Blake
2020-Feb-10 21:44 UTC
[Libguestfs] [nbdkit PATCH 08/10] plugins: Wire up rust plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. For Rust, the changes are just copy-and-paste, but I wasn't sure how to test them beyond a successful build. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/rust/nbdkit-rust-plugin.pod | 2 +- plugins/rust/src/lib.rs | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/plugins/rust/nbdkit-rust-plugin.pod b/plugins/rust/nbdkit-rust-plugin.pod index 77eb0c4..5fcd7ff 100644 --- a/plugins/rust/nbdkit-rust-plugin.pod +++ b/plugins/rust/nbdkit-rust-plugin.pod @@ -103,7 +103,7 @@ C<nbdkit::ThreadModel::Parallel>. =over 4 -=item Missing: C<can_extents> and C<extents> +=item Missing: C<can_extents>, C<extents>, C<preconnect>. These are not yet supported. diff --git a/plugins/rust/src/lib.rs b/plugins/rust/src/lib.rs index 2d691e7..58bbc62 100644 --- a/plugins/rust/src/lib.rs +++ b/plugins/rust/src/lib.rs @@ -107,6 +107,12 @@ pub struct Plugin { pub thread_model: Option<extern fn () -> ThreadModel>, pub can_fast_zero: Option<extern fn (h: *mut c_void) -> c_int>, + + // Slot for preconnect function, which needs more integration. + _preconnect: Option<extern fn ()>, + + pub init_sparse: Option<extern fn (h: *mut c_void) -> c_int>, + pub init_zero: Option<extern fn (h: *mut c_void) -> c_int>, } #[repr(C)] @@ -166,6 +172,9 @@ impl Plugin { cache: None, thread_model: None, can_fast_zero: None, + _preconnect: None, + init_sparse: None, + init_zero: None, } } } -- 2.24.1
Eric Blake
2020-Feb-10 21:44 UTC
[Libguestfs] [nbdkit PATCH 09/10] plugins: Wire up ocaml plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. For Ocaml, the changes are just copy-and-paste, but I wasn't sure how to test them beyond a successful build. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/ocaml/NBDKit.ml | 14 +++++++++++- plugins/ocaml/NBDKit.mli | 5 ++++- plugins/ocaml/ocaml.c | 47 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 64 insertions(+), 2 deletions(-) diff --git a/plugins/ocaml/NBDKit.ml b/plugins/ocaml/NBDKit.ml index 85c30a1..2db011f 100644 --- a/plugins/ocaml/NBDKit.ml +++ b/plugins/ocaml/NBDKit.ml @@ -100,6 +100,9 @@ type 'a plugin = { can_fast_zero : ('a -> bool) option; preconnect : (bool -> unit) option; + + init_sparse : ('a -> bool) option; + init_zero : ('a -> bool) option; } let default_callbacks = { @@ -149,6 +152,9 @@ let default_callbacks = { can_fast_zero = None; preconnect = None; + + init_sparse = None; + init_zero = None; } external set_name : string -> unit = "ocaml_nbdkit_set_name" "noalloc" @@ -198,6 +204,9 @@ external set_can_fast_zero : ('a -> bool) -> unit = "ocaml_nbdkit_set_can_fast_z external set_preconnect : (bool -> unit) -> unit = "ocaml_nbdkit_set_preconnect" +external set_init_sparse : ('a -> bool) -> unit = "ocaml_nbdkit_set_init_sparse" +external set_init_zero : ('a -> bool) -> unit = "ocaml_nbdkit_set_init_zero" + let may f = function None -> () | Some a -> f a let register_plugin plugin @@ -265,7 +274,10 @@ let register_plugin plugin may set_can_fast_zero plugin.can_fast_zero; - may set_preconnect plugin.preconnect + may set_preconnect plugin.preconnect; + + may set_init_sparse plugin.init_sparse; + may set_init_zero plugin.init_zero; external _set_error : int -> unit = "ocaml_nbdkit_set_error" "noalloc" diff --git a/plugins/ocaml/NBDKit.mli b/plugins/ocaml/NBDKit.mli index 4cdf911..966b1bb 100644 --- a/plugins/ocaml/NBDKit.mli +++ b/plugins/ocaml/NBDKit.mli @@ -105,12 +105,15 @@ type 'a plugin = { can_fast_zero : ('a -> bool) option; preconnect : (bool -> unit) option; + + init_sparse : ('a -> bool) option; + init_zero : ('a -> bool) option; } (** The plugin fields and callbacks. ['a] is the handle type. *) val default_callbacks : 'a plugin (** The plugin with all fields set to [None], so you can write - [{ defaults_callbacks with field1 = Some foo1; field2 = Some foo2 }] *) + [{ default_callbacks with field1 = Some foo1; field2 = Some foo2 }] *) val register_plugin : 'a plugin -> unit (** Register the plugin with nbdkit. *) diff --git a/plugins/ocaml/ocaml.c b/plugins/ocaml/ocaml.c index a7d188f..db88f68 100644 --- a/plugins/ocaml/ocaml.c +++ b/plugins/ocaml/ocaml.c @@ -138,6 +138,9 @@ static value can_fast_zero_fn; static value preconnect_fn; +static value init_sparse_fn; +static value init_zero_fn; + /*----------------------------------------------------------------------*/ /* Wrapper functions that translate calls from C (ie. nbdkit) to OCaml. */ @@ -747,6 +750,44 @@ preconnect_wrapper (int readonly) CAMLreturnT (int, 1); } +static int +init_sparse_wrapper (void *h) +{ + CAMLparam0 (); + CAMLlocal1 (rv); + + caml_leave_blocking_section (); + + rv = caml_callback_exn (init_sparse_fn, *(value *) h); + if (Is_exception_result (rv)) { + nbdkit_error ("%s", caml_format_exception (Extract_exception (rv))); + caml_enter_blocking_section (); + CAMLreturnT (int, -1); + } + + caml_enter_blocking_section (); + CAMLreturnT (int, Bool_val (rv)); +} + +static int +init_zero_wrapper (void *h) +{ + CAMLparam0 (); + CAMLlocal1 (rv); + + caml_leave_blocking_section (); + + rv = caml_callback_exn (init_zero_fn, *(value *) h); + if (Is_exception_result (rv)) { + nbdkit_error ("%s", caml_format_exception (Extract_exception (rv))); + caml_enter_blocking_section (); + CAMLreturnT (int, -1); + } + + caml_enter_blocking_section (); + CAMLreturnT (int, Bool_val (rv)); +} + /*----------------------------------------------------------------------*/ /* set_* functions called from OCaml code at load time to initialize * fields in the plugin struct. @@ -838,6 +879,9 @@ SET(can_fast_zero) SET(preconnect) +SET(init_sparse) +SET(init_zero) + #undef SET static void @@ -886,6 +930,9 @@ remove_roots (void) REMOVE (preconnect); + REMOVE (init_sparse); + REMOVE (init_zero); + #undef REMOVE } -- 2.24.1
Eric Blake
2020-Feb-10 21:44 UTC
[Libguestfs] [nbdkit PATCH 10/10] plugins: Wire up nbd plugin support for NBD_INFO_INIT_STATE
The NBD protocol is adding an extension to let servers advertise initialization state to the client: whether the image contains holes, and whether it is known to read as all zeroes. With recent enough libnbd, the nbd plugin can pass this information through. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/nbd/nbd.c | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c index d020bee..8f06420 100644 --- a/plugins/nbd/nbd.c +++ b/plugins/nbd/nbd.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2017-2019 Red Hat Inc. + * Copyright (C) 2017-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -702,6 +702,42 @@ nbdplug_can_extents (void *handle) return i; } +static int +nbdplug_init_sparse (void *handle) +{ + struct handle *h = handle; + int i = 0; + +#if LIBNBD_HAVE_NBD_GET_INIT_FLAGS + i = nbd_get_init_flags (h->nbd); + + if (i == -1) { + nbdkit_error ("failure to check init flags: %s", nbd_get_error ()); + return -1; + } + i = !!(i & LIBNBD_INIT_SPARSE); +#endif + return i; +} + +static int +nbdplug_init_zero (void *handle) +{ + struct handle *h = handle; + int i = 0; + +#if LIBNBD_HAVE_NBD_GET_INIT_FLAGS + i = nbd_get_init_flags (h->nbd); + + if (i == -1) { + nbdkit_error ("failure to check init flags: %s", nbd_get_error ()); + return -1; + } + i = !!(i & LIBNBD_INIT_ZERO); +#endif + return i; +} + /* Read data from the file. */ static int nbdplug_pread (void *handle, void *buf, uint32_t count, uint64_t offset, @@ -860,6 +896,8 @@ static struct nbdkit_plugin plugin = { .can_multi_conn = nbdplug_can_multi_conn, .can_extents = nbdplug_can_extents, .can_cache = nbdplug_can_cache, + .init_sparse = nbdplug_init_sparse, + .init_zero = nbdplug_init_zero, .pread = nbdplug_pread, .pwrite = nbdplug_pwrite, .zero = nbdplug_zero, -- 2.24.1
Richard W.M. Jones
2020-Feb-10 22:12 UTC
Re: [Libguestfs] Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote:> For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the > image has at least one hole) and NBD_INIT_ZERO (the image reads > completely as zero); the two bits are orthogonal and can be set > independently, although it is easy enough to see completely sparse > files with both bits set.I think I'm confused about the exact meaning of NBD_INIT_SPARSE. Do you really mean the whole image is sparse; or (as you seem to have said above) that there exists a hole somewhere in the image but we're not saying where it is and there can be non-sparse parts of the image? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2020-Feb-10 22:24 UTC
Re: [Libguestfs] [libnbd PATCH 1/1] generator: Add support for NBD_INFO_INIT_STATE extension
The idea and patch is fine, but I wonder if it would be more useful to callers if it was exposed as two separate APIs. Callers would then not need to deal with masking out unknown flags, and it works more like the other is_* / can_* ("flag calls") we already have. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Eric Blake
2020-Feb-10 22:29 UTC
Re: [Libguestfs] Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
On 2/10/20 4:12 PM, Richard W.M. Jones wrote:> On Mon, Feb 10, 2020 at 03:37:20PM -0600, Eric Blake wrote: >> For now, only 2 of those 16 bits are defined: NBD_INIT_SPARSE (the >> image has at least one hole) and NBD_INIT_ZERO (the image reads >> completely as zero); the two bits are orthogonal and can be set >> independently, although it is easy enough to see completely sparse >> files with both bits set. > > I think I'm confused about the exact meaning of NBD_INIT_SPARSE. Do > you really mean the whole image is sparse; or (as you seem to have > said above) that there exists a hole somewhere in the image but we're > not saying where it is and there can be non-sparse parts of the image?As implemented: NBD_INIT_SPARSE - there is at least one hole somewhere (allocation would be required to write to that part of the file), but there may b allocated data elsewhere in the image. Most disk images will fit this definition (for example, it is very common to have a hole between the MBR or GPT and the first partition containing a file system, or for file systems themselves to be sparse within the larger block device). NBD_INIT_ZERO - all bytes read as zero. The combination NBD_INIT_SPARSE|NBD_INIT_ZERO is common (generally, if you use lseek(SEEK_DATA) to prove the entire image reads as zeroes, you also know the entire image is sparse), but NBD_INIT_ZERO in isolation is also possible (especially with the qcow2 proposal of a persistent autoclear bit, where even with a fully preallocated qcow2 image you still know it reads as zeroes but there are no holes). But you are also right that for servers that can advertise both bits efficiently, NBD_INIT_SPARSE in isolation may be more common than NBD_INIT_SPARSE|NBD_INIT_ZERO (the former for most disk images, the latter only for a freshly-created image that happens to create with zero initialization). What's more, in my patches, I did NOT patch qemu to set or consume INIT_SPARSE; so far, it only sets/consumes INIT_ZERO. Of course, if we can find a reason WHY qemu should track whether a qcow2 image is fully-allocated, by demonstrating a qemu-img algorithm that becomes easier for knowing if an image is sparse (even if our justification is: "when copying an image, I want to know if the _source_ is sparse, to know whether I have to bend over backwards to preallocate the destination"), then using that in qemu makes sense for my v2 patches. But for v1, my only justification was "when copying an image, I can skip holes in the source if I know the _destination_ already reads as zeroes", which only needed INIT_ZERO. Some of the nbdkit patches demonstrate the some-vs.-all nature of the two bits; for example, in the split plugin, I initialize h->init_sparse = false; h->init_zero = true; then in a loop over each file change h->init_sparse to true if at least one file was sparse, and change h->init_zero to false if at least one file had non-zero contents. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Max Reitz
2020-Feb-17 15:13 UTC
Re: [Libguestfs] Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
Hi, It’s my understanding that without some is_zero infrastructure for QEMU, it’s impossible to implement this flag in qemu’s NBD server. At the same time, I still haven’t understood what we need the flag for. As far as I understood in our discussion on your qemu series, there is no case where anyone would need to know whether an image is zero. All practical cases involve someone having to ensure that some image is zero. Knowing whether an image is zero can help with that, but that can be an implementation detail. For qcow2, the idea would be that there is some flag that remains true as long as the image is guaranteed to be zero. Then we’d have some bdrv_make_zero function, and qcow2’s implementation would use this information to gauge whether there’s something to do as all. For NBD, we cannot use this idea directly because to implement such a flag (as you’re describing in this mail), we’d need separate is_zero infrastructure, and that kind of makes the point of “drivers’ bdrv_make_zero() implementations do the right thing by themselves” moot. OTOH, we wouldn’t need such a flag for the implementation, because we could just send a 64-bit discard/make_zero over the whole block device length to the NBD server, and then the server internally does the right thing(TM). AFAIU discard and write_zeroes currently have only 32 bit length fields, but there were plans for adding support for 64 bit versions anyway. From my naïve outsider perspective, doing that doesn’t seem a more complicated protocol addition than adding some way to tell whether an NBD export is zero. So I’m still wondering whether there are actually cases where we need to tell whether some image or NBD export is zero that do not involve making it zero if it isn’t. (I keep asking because it seems to me that if all we ever really want to do is to ensure that some images/exports are zero, we should implement that.) Max
Eric Blake
2020-Feb-18 20:55 UTC
Re: [Libguestfs] Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
On 2/17/20 9:13 AM, Max Reitz wrote:> Hi, > > It’s my understanding that without some is_zero infrastructure for QEMU, > it’s impossible to implement this flag in qemu’s NBD server.You're right that we may need some more infrastructure before being able to decide when to report this bit in all cases. But for raw files, that infrastructure already exists: does block_status at offset 0 and the entire image as length return status that the entire file is a hole. And for qcow2 files, it would not be that hard to teach a similar block_status request to report the entire image as a hole based on my proposed qcow2 autoclear bit tracking that the image still reads as zero.> > At the same time, I still haven’t understood what we need the flag for. > > As far as I understood in our discussion on your qemu series, there is > no case where anyone would need to know whether an image is zero. All > practical cases involve someone having to ensure that some image is > zero. Knowing whether an image is zero can help with that, but that can > be an implementation detail. > > For qcow2, the idea would be that there is some flag that remains true > as long as the image is guaranteed to be zero. Then we’d have some > bdrv_make_zero function, and qcow2’s implementation would use this > information to gauge whether there’s something to do as all. > > For NBD, we cannot use this idea directly because to implement such a > flag (as you’re describing in this mail), we’d need separate is_zero > infrastructure, and that kind of makes the point of “drivers’ > bdrv_make_zero() implementations do the right thing by themselves” moot.We don't necessarily need a separate is_zero infrastructure if we can instead teach the existing block_status infrastructure to report that the entire image reads as zero. You're right that clients that need to force an entire image to be zero won't need to directly call block_status (they can just call bdrv_make_zero, and let that worry about whether a block status call makes sense among its list of steps to try). But since block_status can report all-zero status for some cases, it's not hard to use that for feeding the NBD bit. However, there's a difference between qemu's block status (which is already typed correctly to return a 64-bit answer, even if it may need a few tweaks for clients that currently don't expect it to request more than 32 bits) and NBD's block status (which can only report 32 bits barring a new extension to the protocol), and where a single all-zero bit at NBD_OPT_GO is just as easy of an extension as a way to report a 64-bit all-zero response to NBD_CMD_BLOCK_STATUS.> > OTOH, we wouldn’t need such a flag for the implementation, because we > could just send a 64-bit discard/make_zero over the whole block device > length to the NBD server, and then the server internally does the right > thing(TM). AFAIU discard and write_zeroes currently have only 32 bit > length fields, but there were plans for adding support for 64 bit > versions anyway. From my naïve outsider perspective, doing that doesn’t > seem a more complicated protocol addition than adding some way to tell > whether an NBD export is zero.Adding 64-bit commands to NBD is more invasive than adding a single startup status bit. Both ideas can be done - doing one does not preclude the other. But at the same time, not all servers will implement both ideas - if one is easy to implement while the other is hard, it is not unlikely that qemu will still encounter NBD servers that advertise startup state but not support 64-bit make_zero (even if qemu as NBD server starts supporting 64-bit make zero) or even 64-bit block status results. Another thing to think about here is timing. With the proposed NBD addition, it is the server telling the client that "the image you are connecting to started zero", prior to the point that the client even has a chance to request "can you make the image all zero in a quick manner (and if not, I'll fall back to writing zeroes as I go)". And even if NBD gains a 64-bit block status and/or make zero command, it is still less network traffic for the server to advertise up-front that the image is all zero than it is for the client to have to issue command requests of the server (network traffic is not always the bottleneck, but it can be a consideration).> > So I’m still wondering whether there are actually cases where we need to > tell whether some image or NBD export is zero that do not involve making > it zero if it isn’t.Just because we don't think that qemu-img has such a case does not mean that other NBD clients will not be able to come up with some use for knowing if an image starts all zero.> > (I keep asking because it seems to me that if all we ever really want to > do is to ensure that some images/exports are zero, we should implement > that.)The problem is WHERE do you implement it. Is it more efficient to implement make_zero in the NBD server (the client merely requests to make zero, but lets the server do all the work) or in the NBD client (the client learns whether the server is already zero, and not hearing yes, the client proceeds to do all the work to write zeroes). From the qemu perspective, qemu-img convert needs the image to be zero, and bdrv_make_zero will report back either "yes I quickly made it zero, possibly by doing nothing" or "no, making it zero now is no more efficient than you just writing zeroes as you go". But although the code in qemu-img is the same whether bdrv_make_zero is able to request the work be done in the server or whether the work has to be done in the client, the code in the block layer that implements bdrv_make_zero may itself care about knowing whether the NBD server started all zero. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Reasonably Related Threads
- Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
- Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
- Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE
- [nbdkit PATCH 05/10] plugins: Wire up file-based plugin support for NBD_INFO_INIT_STATE
- Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE