Eric Blake
2017-Jan-24 03:13 UTC
[Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
Begin the language binding followups to my new .zero callback, since Rich was indeed correct that we want them. I'm more familiar with python and perl (at least to the point that I was able to modify the appropriate example files and prove to myself that the bindings worked), so I've started with those. I'm less familiar with ruby and ocaml, so I've left those for tomorrow (it will rely heavily on copy-and-paste); it is also a chance to review if my decision of requiring a return value in order to trigger the graceful fallback is the best approach. Eric Blake (2): perl: Support zero callback python: Support zero callback plugins/perl/nbdkit-perl-plugin.pod | 25 ++++++++++++++++++++++ plugins/perl/perl.c | 37 +++++++++++++++++++++++++++++++++ plugins/python/nbdkit-python-plugin.pod | 18 ++++++++++++++++ plugins/python/python.c | 35 +++++++++++++++++++++++++++++++ 4 files changed, 115 insertions(+) -- 2.9.3
Eric Blake
2017-Jan-24 03:13 UTC
[Libguestfs] [nbdkit PATCH 1/2] perl: Support zero callback
Add a perl language binding for the .zero callback, used for implementing NBD_CMD_WRITE_ZEROES. Unlike the pwrite callback with no return type, and unlike C where we can manipulate errno, we need a way to distinguish between hard error (the usual die), and the fallback error (return false so the binding can turn it into EOPNOTSUPP), so success requires returning true. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/perl/nbdkit-perl-plugin.pod | 25 +++++++++++++++++++++++++ plugins/perl/perl.c | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+) diff --git a/plugins/perl/nbdkit-perl-plugin.pod b/plugins/perl/nbdkit-perl-plugin.pod index 3d0ce14..39df7b1 100644 --- a/plugins/perl/nbdkit-perl-plugin.pod +++ b/plugins/perl/nbdkit-perl-plugin.pod @@ -271,6 +271,31 @@ store. If there is an error, the function should call C<die>. +=item C<zero> + +(Optional) + + sub zero + { + my $handle = shift; + my $count = shift; + my $offset = shift; + my $may_trim = shift; + my $bool = ...; + return $bool; + } + +The body of your C<zero> function should ensure that C<$count> bytes +of the disk, starting at C<$offset>, will read back as zero. If +C<$may_trim> is true, the operation may be optimized as a trim as long +as subsequent reads see zeroes. Return true if the write was +successful, and false to trigger a graceful fallback to C<pwrite>. + +NBD only supports whole writes, so your function should try to write +the whole region (perhaps requiring a loop). If the write fails or is +partial, and you do not want the fallback to C<pwrite>, your function +should C<die>. + =back =head2 MISSING CALLBACKS diff --git a/plugins/perl/perl.c b/plugins/perl/perl.c index 935e1ba..ba42864 100644 --- a/plugins/perl/perl.c +++ b/plugins/perl/perl.c @@ -494,6 +494,42 @@ perl_can_trim (void *handle) } static int +perl_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) +{ + dSP; + SV *sv; + int r = 0; + + if (callback_defined ("zero")) { + ENTER; + SAVETMPS; + PUSHMARK (SP); + XPUSHs (handle); + XPUSHs (sv_2mortal (newSViv (count))); + XPUSHs (sv_2mortal (newSViv (offset))); + XPUSHs (sv_2mortal (newSViv (may_trim))); + PUTBACK; + call_pv ("zero", G_EVAL|G_SCALAR); + SPAGAIN; + sv = POPs; + r = SvIV (sv); + PUTBACK; + FREETMPS; + LEAVE; + + if (check_perl_failure () == -1) + return -1; + + if (r) + return 0; + } + + nbdkit_debug ("zero falling back to pwrite"); + errno = EOPNOTSUPP; + return -1; +} + +static int perl_is_rotational (void *handle) { dSP; @@ -614,6 +650,7 @@ static struct nbdkit_plugin plugin = { .pwrite = perl_pwrite, .flush = perl_flush, .trim = perl_trim, + .zero = perl_zero, }; NBDKIT_REGISTER_PLUGIN(plugin) -- 2.9.3
Eric Blake
2017-Jan-24 03:13 UTC
[Libguestfs] [nbdkit PATCH 2/2] python: Support zero callback
Add a python language binding for the .zero callback, used for implementing NBD_CMD_WRITE_ZEROES. Unlike the pwrite callback with no return type, and unlike C where we can manipulate errno, we need a way to distinguish between hard error (the usual exception), and the fallback error (return false so the binding can turn it into EOPNOTSUPP), so success requires returning true. Signed-off-by: Eric Blake <eblake@redhat.com> --- plugins/python/nbdkit-python-plugin.pod | 18 +++++++++++++++++ plugins/python/python.c | 35 +++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/plugins/python/nbdkit-python-plugin.pod b/plugins/python/nbdkit-python-plugin.pod index 9b0f0ef..b3eb883 100644 --- a/plugins/python/nbdkit-python-plugin.pod +++ b/plugins/python/nbdkit-python-plugin.pod @@ -196,6 +196,24 @@ L<fdatasync(2)> or equivalent on the backing store. The body of your C<trim> function should "punch a hole" in the backing store. +=item C<zero> + +(Optional) + + def zero(h, count, offset, may_trim): + # return a boolean + +The body of your C<zero> function should ensure that C<count> bytes +of the disk, starting at C<offset>, will read back as zero. If +C<may_trim> is true, the operation may be optimized as a trim as long +as subsequent reads see zeroes. Return true if the write was +successful, and false to trigger a graceful fallback to C<pwrite>. + +NBD only supports whole writes, so your function should try to +write the whole region (perhaps requiring a loop). If the write +fails or is partial, and you do not want the fallback to C<pwrite>, +your function should throw an exception. + =back =head2 MISSING CALLBACKS diff --git a/plugins/python/python.c b/plugins/python/python.c index 0504715..46f50f2 100644 --- a/plugins/python/python.c +++ b/plugins/python/python.c @@ -423,6 +423,40 @@ py_trim (void *handle, uint32_t count, uint64_t offset) } static int +py_zero (void *handle, uint32_t count, uint64_t offset, int may_trim) +{ + PyObject *obj = handle; + PyObject *fn; + PyObject *args; + PyObject *r; + int ret; + + if (callback_defined ("zero", &fn)) { + PyErr_Clear (); + + args = PyTuple_New (4); + Py_INCREF (obj); /* decremented by Py_DECREF (args) */ + PyTuple_SetItem (args, 0, obj); + PyTuple_SetItem (args, 1, PyLong_FromUnsignedLongLong (count)); + PyTuple_SetItem (args, 2, PyLong_FromUnsignedLongLong (offset)); + PyTuple_SetItem (args, 3, PyBool_FromLong (may_trim)); + r = PyObject_CallObject (fn, args); + Py_DECREF (fn); + Py_DECREF (args); + if (check_python_failure ("zero") == -1) + return -1; + ret = r == Py_True; + Py_DECREF (r); + if (ret) + return 0; + } + + nbdkit_debug ("zero falling back to pwrite"); + errno = EOPNOTSUPP; + return -1; +} + +static int py_can_write (void *handle) { PyObject *obj = handle; @@ -579,6 +613,7 @@ static struct nbdkit_plugin plugin = { .pwrite = py_pwrite, .flush = py_flush, .trim = py_trim, + .zero = py_zero, }; NBDKIT_REGISTER_PLUGIN(plugin) -- 2.9.3
Richard W.M. Jones
2017-Jan-24 11:38 UTC
Re: [Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
On Mon, Jan 23, 2017 at 09:13:23PM -0600, Eric Blake wrote:> Begin the language binding followups to my new .zero callback, since > Rich was indeed correct that we want them. > > I'm more familiar with python and perl (at least to the point that > I was able to modify the appropriate example files and prove to > myself that the bindings worked), so I've started with those. > I'm less familiar with ruby and ocaml, so I've left those for > tomorrow (it will rely heavily on copy-and-paste); it is also a > chance to review if my decision of requiring a return value in > order to trigger the graceful fallback is the best approach.The other methods (eg. trim) deal with this by having a separate method called "can_*" (eg. can_trim). Can we have a can_zero callback instead? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Eric Blake
2017-Jan-24 14:51 UTC
Re: [Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
On 01/24/2017 05:38 AM, Richard W.M. Jones wrote:> On Mon, Jan 23, 2017 at 09:13:23PM -0600, Eric Blake wrote: >> Begin the language binding followups to my new .zero callback, since >> Rich was indeed correct that we want them. >> >> I'm more familiar with python and perl (at least to the point that >> I was able to modify the appropriate example files and prove to >> myself that the bindings worked), so I've started with those. >> I'm less familiar with ruby and ocaml, so I've left those for >> tomorrow (it will rely heavily on copy-and-paste); it is also a >> chance to review if my decision of requiring a return value in >> order to trigger the graceful fallback is the best approach. > > The other methods (eg. trim) deal with this by having a separate > method called "can_*" (eg. can_trim). Can we have a can_zero callback > instead?I thought about that, but there's several problems: 1) the can_trim and similar functions are required in order to alter what the server advertises to the guest during the handshake phase; at which point the guest is out-of-spec if it tries to use the command that the plugin did not advertise. But while those other commands have no real impact to the amount of wire traffic, we want to always advertise WRITE_ZEROES support to the client (because it is so much more efficient to pass a WRITE_ZEROES command over the wire than a regular WRITE with a payload of zeroes), even when the plugin doesn't support it. Since having a can_zero callback does not affect what we advertise, the only other reason to implement it would be if can affect what we ask the plugin to do. 2) the fallocate() system call does not have any introspectible way to tell if it will work on a given file system, other than to try it and see if it fails with EOPNOTSUPP. On the C side, I thought it was elegant to let the errno value be a deciding factor on whether the attempt was fatal or whether to gracefully fall back to calling the pwrite callback, while still centralizing the calloc() logic in the plugin.c file rather than making every single C implementation of .zero repeat the logic. Since it is not easy to introspect whether fallocate() will fail on a given file, it's much harder to write a can_zero() function than it is to just try fallocate() and have a graceful fallback. 3) the NBD_CMD_WRITE_ZEROES command has a flag value that affects operation. The existing pread and pwrite callbacks do NOT have a flag argument (arguably, a hole in those specifications if we ever come up with a reason that we need to pass flags to those commands, but not necessarily something to worry about today). But the write zeroes command, per spec, has a way to state whether the server may use trim (provided that the hole reads back as zero) or must leave things allocated, giving the client some rudimentary control over whether the server will create a sparse file. The spec says that the server is not required to trim (so falling back to pwrite is always appropriate), but the way the spec is worded, using fallocate(FALLOC_FL_PUNCH_HOLE) is only appropriate when the may_trim flag is set in the existing C interface (which in turn corresponds to a lack of the NBD_CMD_FLAG_NO_HOLE flag on the wire). Having a boolean can_zero() callback is insufficient to tell whether the plugin would behave differently for may_trim=0 vs. may_trim=1 So here's some ideas for alternative implementations in the language bindings: 1) give the binding a way to request graceful fallback to write (what I implemented in this version) 2) provide a completely different interface to the language bindings, but keeping the C interface as already committed. The language binding would only implement a single new entry point, perhaps named trim_reads_as_zero(). On the C side of the binding, when plugin.c calls lang_zero(, may_trim=1), C checks if trim_reads_as_zero() is defined and returns true, and if so calls the existing trim() binding; if trim_reads_as-zero() is not defined, or returns false, or trim() is not defined, then C calls the existing pwrite() binding. Thus, languages do NOT need a zero() binding. But doing this may lose opportunities for optimizing an all-zero write even where fallocate(FALLOC_FL_PUNCH_HOLE) is inappropriate, as newer Linux has added fallocate(FALLOC_FL_ZERO_RANGE) [hmm, maybe I should do a followup patch to file.c to add the use of that option, but that's a story for another patch] 3) adjust the C interface to match the bindings interface proposed in 2, by making trim_reads_as_zero() part of the overall callback interface. We haven't released yet, so now's the time to tweak the interfaces, but like option 2, it makes it harder for an implementation to take advantage of faster ways to write large blocks of zeroes while still avoiding holes. 4) document that the language bindings can't use automatic fallback to write - they must implement it themselves. But unlike C having to do a verbose reimplementation of a call to calloc() and free(), the languages we bind to tend to have more compact representations for quickly creating a buffer of all zeros (perl's "\0" x count, python's bytearray(count), etc). I'm probably leaning towards number 4, since you expressed concern about number 1, but would love further opinions before I spend any more time churning out patches that aren't needed. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Reasonably Related Threads
- Re: [nbdkit PATCH 0/2] bind .zero to more languages
- Re: [nbdkit PATCH 0/2] bind .zero to more languages
- [nbdkit PATCH 0/2] bind .zero to more languages
- [PATCH nbdkit 2/3] sh: Switch nbdkit-sh-plugin to use API version 2.
- [PATCH nbdkit v2 3/4] sh: Switch nbdkit-sh-plugin to use API version 2.