Eric Blake
2017-Jan-24 14:51 UTC
Re: [Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
On 01/24/2017 05:38 AM, Richard W.M. Jones wrote:> On Mon, Jan 23, 2017 at 09:13:23PM -0600, Eric Blake wrote: >> Begin the language binding followups to my new .zero callback, since >> Rich was indeed correct that we want them. >> >> I'm more familiar with python and perl (at least to the point that >> I was able to modify the appropriate example files and prove to >> myself that the bindings worked), so I've started with those. >> I'm less familiar with ruby and ocaml, so I've left those for >> tomorrow (it will rely heavily on copy-and-paste); it is also a >> chance to review if my decision of requiring a return value in >> order to trigger the graceful fallback is the best approach. > > The other methods (eg. trim) deal with this by having a separate > method called "can_*" (eg. can_trim). Can we have a can_zero callback > instead?I thought about that, but there's several problems: 1) the can_trim and similar functions are required in order to alter what the server advertises to the guest during the handshake phase; at which point the guest is out-of-spec if it tries to use the command that the plugin did not advertise. But while those other commands have no real impact to the amount of wire traffic, we want to always advertise WRITE_ZEROES support to the client (because it is so much more efficient to pass a WRITE_ZEROES command over the wire than a regular WRITE with a payload of zeroes), even when the plugin doesn't support it. Since having a can_zero callback does not affect what we advertise, the only other reason to implement it would be if can affect what we ask the plugin to do. 2) the fallocate() system call does not have any introspectible way to tell if it will work on a given file system, other than to try it and see if it fails with EOPNOTSUPP. On the C side, I thought it was elegant to let the errno value be a deciding factor on whether the attempt was fatal or whether to gracefully fall back to calling the pwrite callback, while still centralizing the calloc() logic in the plugin.c file rather than making every single C implementation of .zero repeat the logic. Since it is not easy to introspect whether fallocate() will fail on a given file, it's much harder to write a can_zero() function than it is to just try fallocate() and have a graceful fallback. 3) the NBD_CMD_WRITE_ZEROES command has a flag value that affects operation. The existing pread and pwrite callbacks do NOT have a flag argument (arguably, a hole in those specifications if we ever come up with a reason that we need to pass flags to those commands, but not necessarily something to worry about today). But the write zeroes command, per spec, has a way to state whether the server may use trim (provided that the hole reads back as zero) or must leave things allocated, giving the client some rudimentary control over whether the server will create a sparse file. The spec says that the server is not required to trim (so falling back to pwrite is always appropriate), but the way the spec is worded, using fallocate(FALLOC_FL_PUNCH_HOLE) is only appropriate when the may_trim flag is set in the existing C interface (which in turn corresponds to a lack of the NBD_CMD_FLAG_NO_HOLE flag on the wire). Having a boolean can_zero() callback is insufficient to tell whether the plugin would behave differently for may_trim=0 vs. may_trim=1 So here's some ideas for alternative implementations in the language bindings: 1) give the binding a way to request graceful fallback to write (what I implemented in this version) 2) provide a completely different interface to the language bindings, but keeping the C interface as already committed. The language binding would only implement a single new entry point, perhaps named trim_reads_as_zero(). On the C side of the binding, when plugin.c calls lang_zero(, may_trim=1), C checks if trim_reads_as_zero() is defined and returns true, and if so calls the existing trim() binding; if trim_reads_as-zero() is not defined, or returns false, or trim() is not defined, then C calls the existing pwrite() binding. Thus, languages do NOT need a zero() binding. But doing this may lose opportunities for optimizing an all-zero write even where fallocate(FALLOC_FL_PUNCH_HOLE) is inappropriate, as newer Linux has added fallocate(FALLOC_FL_ZERO_RANGE) [hmm, maybe I should do a followup patch to file.c to add the use of that option, but that's a story for another patch] 3) adjust the C interface to match the bindings interface proposed in 2, by making trim_reads_as_zero() part of the overall callback interface. We haven't released yet, so now's the time to tweak the interfaces, but like option 2, it makes it harder for an implementation to take advantage of faster ways to write large blocks of zeroes while still avoiding holes. 4) document that the language bindings can't use automatic fallback to write - they must implement it themselves. But unlike C having to do a verbose reimplementation of a call to calloc() and free(), the languages we bind to tend to have more compact representations for quickly creating a buffer of all zeros (perl's "\0" x count, python's bytearray(count), etc). I'm probably leaning towards number 4, since you expressed concern about number 1, but would love further opinions before I spend any more time churning out patches that aren't needed. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Richard W.M. Jones
2017-Jan-24 15:16 UTC
Re: [Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
On Tue, Jan 24, 2017 at 08:51:05AM -0600, Eric Blake wrote:> On 01/24/2017 05:38 AM, Richard W.M. Jones wrote: > > On Mon, Jan 23, 2017 at 09:13:23PM -0600, Eric Blake wrote: > >> Begin the language binding followups to my new .zero callback, since > >> Rich was indeed correct that we want them. > >> > >> I'm more familiar with python and perl (at least to the point that > >> I was able to modify the appropriate example files and prove to > >> myself that the bindings worked), so I've started with those. > >> I'm less familiar with ruby and ocaml, so I've left those for > >> tomorrow (it will rely heavily on copy-and-paste); it is also a > >> chance to review if my decision of requiring a return value in > >> order to trigger the graceful fallback is the best approach. > > > > The other methods (eg. trim) deal with this by having a separate > > method called "can_*" (eg. can_trim). Can we have a can_zero callback > > instead? > > I thought about that, but there's several problems: > > 1) the can_trim and similar functions are required in order to alter > what the server advertises to the guest during the handshake phase; at > which point the guest is out-of-spec if it tries to use the command that > the plugin did not advertise. But while those other commands have no > real impact to the amount of wire traffic, we want to always advertise > WRITE_ZEROES support to the client (because it is so much more efficient > to pass a WRITE_ZEROES command over the wire than a regular WRITE with a > payload of zeroes), even when the plugin doesn't support it. Since > having a can_zero callback does not affect what we advertise, the only > other reason to implement it would be if can affect what we ask the > plugin to do. > > 2) the fallocate() system call does not have any introspectible way to > tell if it will work on a given file system, other than to try it and > see if it fails with EOPNOTSUPP. On the C side, I thought it was > elegant to let the errno value be a deciding factor on whether the > attempt was fatal or whether to gracefully fall back to calling the > pwrite callback, while still centralizing the calloc() logic in the > plugin.c file rather than making every single C implementation of .zero > repeat the logic. Since it is not easy to introspect whether > fallocate() will fail on a given file, it's much harder to write a > can_zero() function than it is to just try fallocate() and have a > graceful fallback. > > 3) the NBD_CMD_WRITE_ZEROES command has a flag value that affects > operation. The existing pread and pwrite callbacks do NOT have a flag > argument (arguably, a hole in those specifications if we ever come up > with a reason that we need to pass flags to those commands, but not > necessarily something to worry about today). But the write zeroes > command, per spec, has a way to state whether the server may use trim > (provided that the hole reads back as zero) or must leave things > allocated, giving the client some rudimentary control over whether the > server will create a sparse file. The spec says that the server is not > required to trim (so falling back to pwrite is always appropriate), but > the way the spec is worded, using fallocate(FALLOC_FL_PUNCH_HOLE) is > only appropriate when the may_trim flag is set in the existing C > interface (which in turn corresponds to a lack of the > NBD_CMD_FLAG_NO_HOLE flag on the wire). Having a boolean can_zero() > callback is insufficient to tell whether the plugin would behave > differently for may_trim=0 vs. may_trim=1 > > So here's some ideas for alternative implementations in the language > bindings: > > 1) give the binding a way to request graceful fallback to write (what I > implemented in this version) > > 2) provide a completely different interface to the language bindings, > but keeping the C interface as already committed. The language binding > would only implement a single new entry point, perhaps named > trim_reads_as_zero(). On the C side of the binding, when plugin.c calls > lang_zero(, may_trim=1), C checks if trim_reads_as_zero() is defined and > returns true, and if so calls the existing trim() binding; if > trim_reads_as-zero() is not defined, or returns false, or trim() is not > defined, then C calls the existing pwrite() binding. Thus, languages do > NOT need a zero() binding. But doing this may lose opportunities for > optimizing an all-zero write even where fallocate(FALLOC_FL_PUNCH_HOLE) > is inappropriate, as newer Linux has added > fallocate(FALLOC_FL_ZERO_RANGE) [hmm, maybe I should do a followup patch > to file.c to add the use of that option, but that's a story for another > patch] > > 3) adjust the C interface to match the bindings interface proposed in 2, > by making trim_reads_as_zero() part of the overall callback interface. > We haven't released yet, so now's the time to tweak the interfaces, but > like option 2, it makes it harder for an implementation to take > advantage of faster ways to write large blocks of zeroes while still > avoiding holes. > > 4) document that the language bindings can't use automatic fallback to > write - they must implement it themselves. But unlike C having to do a > verbose reimplementation of a call to calloc() and free(), the languages > we bind to tend to have more compact representations for quickly > creating a buffer of all zeros (perl's "\0" x count, python's > bytearray(count), etc). > > I'm probably leaning towards number 4, since you expressed concern about > number 1, but would love further opinions before I spend any more time > churning out patches that aren't needed.Another option -- simpler, I think -- is to review the way that plugins set errors. Currently there is only one call (nbdkit_error) which takes a string. Thus if the plugin error comes with an errno, it is effectively lost (or at best, turned into a string, but not in a way that callers can use). However we could add another API: nbdkit_set_errno (int errno); This allows the plugin to send the errno back to the core code explicitly. It's effectively stored as a global in the core code. Note that the correct calling sequence from C is: nbdkit_set_error (errno); // this call is optional and may be omitted nbdkit_error ("oops: %s", strerror (errno)); return -1; Mapping these to the language bindings should be easy. We just need to add a binding to nbdkit_set_error, but the exception mechanism for indicating an error and sending the string can stay the same. In Perl, old code to raise an error would look like: sub pread { my $h = shift; my $count = shift; my $offset = shift; my $ret; read ($FH, $ret, $count, $offset) || die "read: $!" return $ret; } (Note that die does not exit the program, it raises an exception which is caught and turned into nbdkit_error in the Perl nbdkit plugin). This would continue to work fine, but new code which cared about the errno could do this instead: sub pread { my $h = shift; my $count = shift; my $offset = shift; my $ret; read ($FH, $ret, $count, $offset) || { nbdkit_set_errno (POSIX::errno ()); die "read: $!" } return $ret; } Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Richard W.M. Jones
2017-Jan-24 15:41 UTC
Re: [Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
On Tue, Jan 24, 2017 at 03:16:45PM +0000, Richard W.M. Jones wrote:> However we could add another API: > > nbdkit_set_errno (int errno); > > This allows the plugin to send the errno back to the core code > explicitly. It's effectively stored as a global in the core code.What I'm really saying here is that at the moment (after your previous patch series) we have an implicit channel where errno is passed from the plugin to the core code. We should simply make that channel explicit and controlled by the plugin. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Eric Blake
2017-Jan-26 02:48 UTC
Re: [Libguestfs] [nbdkit PATCH 0/2] bind .zero to more languages
On 01/24/2017 09:16 AM, Richard W.M. Jones wrote:> In Perl, old code to raise an error would look like: > > sub pread > { > my $h = shift; > my $count = shift; > my $offset = shift; > my $ret; > read ($FH, $ret, $count, $offset) || die "read: $!" > return $ret; > } > > (Note that die does not exit the program, it raises an exception which > is caught and turned into nbdkit_error in the Perl nbdkit plugin). > > This would continue to work fine, but new code which cared about the > errno could do this instead: > > sub pread > { > my $h = shift; > my $count = shift; > my $offset = shift; > my $ret; > read ($FH, $ret, $count, $offset) || { > nbdkit_set_errno (POSIX::errno ()); > die "read: $!" > } > return $ret; > }Except I can't even figure out how to expose nbdkit_set_error (hmm, I named it set_error instead of set_errno in my v2 series) to the perl code. In fact, with just this change to example.pl: diff --git i/plugins/perl/example.pl w/plugins/perl/example.pl index fcdac33..64f6de3 100644 --- i/plugins/perl/example.pl +++ w/plugins/perl/example.pl @@ -1,4 +1,5 @@ use strict; +use POSIX (); # Example Perl plugin. # @@ -80,6 +81,8 @@ sub pwrite my $buf = shift; my $count = length ($buf); my $offset = shift; + my $err = POSIX::EPERM; substr ($disk, $offset, $count) = $buf; + die "forced write failure"; } I'm getting this failure: $ ./src/nbdkit -e foo -fv plugins/perl/.libs/nbdkit-perl-plugin.so \ script=plugins/perl/example.pl nbdkit: debug: registering plugins/perl/.libs/nbdkit-perl-plugin.so nbdkit: debug: registered plugins/perl/.libs/nbdkit-perl-plugin.so (name perl) nbdkit: debug: plugins/perl/.libs/nbdkit-perl-plugin.so: load nbdkit: debug: plugins/perl/.libs/nbdkit-perl-plugin.so: config key=script, value=plugins/perl/example.pl Can't load module Fcntl, dynamic loading not available in this perl. (You may need to build a new perl executable which either supports dynamic loading or has the Fcntl module statically linked into it.) at /usr/lib64/perl5/POSIX.pm line 17. Compilation failed in require at /usr/lib64/perl5/POSIX.pm line 17. BEGIN failed--compilation aborted at /usr/lib64/perl5/POSIX.pm line 17. Compilation failed in require at plugins/perl/example.pl line 2. BEGIN failed--compilation aborted at plugins/perl/example.pl line 2. nbdkit: error: plugins/perl/example.pl: one of the required callbacks 'open', 'get_size' or 'pread' is not defined by this Perl script. nbdkit requires these callbacks. So I'm not sure how anyone else is doing anything fancy in an NBD perl script. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org