Eric Blake
2019-Mar-23 16:25 UTC
Re: [Libguestfs] [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
On 3/20/19 5:11 PM, Richard W.M. Jones wrote:> This pair of calls allows plugins to describe which extents in the > virtual disk are allocated, holes or zeroes. > ---I see you've already made a couple of tweaks to your block-status branch since posting this (skipping 0-legnth add_extent, and coalescing consecutive add_extents with the same type), but here's some more review:> +++ b/docs/nbdkit-filter.pod> +It is also possible for filters to transform the extents list received > +back from the layer below. Without error checking it would look like > +this: > + > + myfilter_extents (...) > + { > + size_t i; > + struct nbdkit_extents *extents2; > + struct nbdkit_extent e; > + > + extents2 = nbdkit_extents_new (offset); > + next_ops->extents (nxdata, count, offset, flags, extents2, err); > + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { > + e = nbdkit_get_extent (extents2, i); > + e.offset = /* transform offset */; > + nbdkit_add_extent (extents, e.offset, e.length, e.type); > + } > + nbdkit_extents_free (extents2); > + } > + > +More complicated transformations may require the filter to allocate a > +new extents list using C<nbdkit_extents_new> and free the old one > +using C<nbdkit_extents_free>.This sentence sounds odd after the example. You'll probably want to reword this a bit. I see that you chose to track an extent per 'struct nbdkit_extent' rather than leaving it implicit based solely on what struct nbdkit_extents tracks, which means that the offset filter has to allocate a new one to pass to the plugin, AND has to adjust each extent before returning. Conceptually, you have something like: extents (starts at 1000): [ { offset=1000, length=100, type=0 }, { offset=1100, length=100, type=1 }, ... ] and since we allow plugins to add extents prior to the starting offset (as a no-op), the offset filter has to create a new filter with a new starting offset. As a thought experiment, I wonder if it would be any easier to have 'struct nbdkit_extents' track merely a next-expected offset, as well as a cumulative length, and then omit the offset from 'struct nbdkit_extent'; also, give filters a way to adjust the next expected offset. Then you could do something like: offsetfilter_extents (handle, offset=1000, count=500) { // extents has length=0, next=1000, array=[ ] // implied start=1000 nbdkit_extents_adjust_next (extents, h->offs); // extents has length=0, next=1000+h->offs, array=[ ] // implied start=1000+h->offs ret = next_ops->extents (nxdata, count, offsest + h->offs, flags, extents, err); // client populated extents: length=500, next=1500+h->offs, array=[ // { length=100, type=0 }, // offset would compute as 1000+h->offs // { length=100, type=1 }, // offset would compute as 1100+h->offs // ... // ], implied start=1000+h->offs nbdkit_extents_adjust_next (extents, -h->offs); // extents length=500, next=1500, array=[ // { length=100, type=0 }, // offset would compute as 1000 // { length=100, type=1 }, // offset would compute as 1100 // ... // ], implied start=1000 return ret; } Doing that may also make it easier to do write a filter that mixes its own holes with the underlying status of the plugin: // append hole owned by filter: nbdkit_extent_add(extents, hole_length, offset, hole) // change expected next offset for plugin int64_t delta = nbdkit_extents_adjust_next (extents, 0) // let plugin append its extents next_ops->extents (nxdata, count - hole_length, 0, flags, extents, err) // adjust things back to normal nbdkit_extents_adjust_next (extents, delta) Having typed that, it may require less computation time, but requires more thought about how it actually all works, so keeping your implementation (where the filter HAS to allocate a new extents, then copy the plugins answers back to its own, rather than being able to blindly reuse the plugins answers in place) is okay. As I said, mine was just a thought experiment for comparison, so it doesn't bother me if you don't use it.> + > +If there is an error, C<.extents> should call C<nbdkit_error> with an > +error message B<and> return -1 with C<err> set to the positive errno > +value to return to the client. > + > +=head3 Allocating and freeing nbdkit_extents list > + > +Two functions are provided to filters only for allocating and freeing > +the map: > + > + struct nbdkit_extents *nbdkit_extents_new (uint64_t start); > + > +Allocates and returns a new, empty extents list. The C<start> > +parameter is the start of the range described in the list. Normally > +you would pass in C<offset> as this parameter.Well, the offset that the plugin should start at (which might not be your C<offset>, if your filter is adjusting the plugin's offsets)> + > +On error this function can return C<NULL>. In this case it calls > +C<nbdkit_error> and/or C<nbdkit_set_error> as required. > + > + void nbdkit_extents_free (struct nbdkit_extents *); > + > +Frees an existing extents list. > + > +=head3 Iterating over nbdkit_extents list > + > +Two functions are provided to filters only to iterate over the extents > +in order: > + > + size_t nbdkit_extents_size (const struct nbdkit_extents *); > + > +Returns the number of extents in the list. > + > + struct nbdkit_extent { > + uint64_t offset; > + uint64_t length; > + uint32_t type; > + }; > + struct nbdkit_extent nbdkit_get_extent (const struct nbdkit_extents *, > + size_t i); > + > +Returns a copy of the C<i>'th extent.If you go with my thought experiment, it gets harder to provide this signature. This signature is certainly easier than the foreach() iterator with callback function of your previous version, though, so I do like how enforcing the client to work in ascending contiguous order simplified things.> + > =head1 ERROR HANDLING > > If there is an error in the filter itself, the filter should call > diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod > index 47545f3..a603f8b 100644 > --- a/docs/nbdkit-plugin.pod > +++ b/docs/nbdkit-plugin.pod > @@ -555,6 +555,20 @@ This callback is not required. If omitted, then nbdkit always tries > C<.zero> first if it is present, and gracefully falls back to > C<.pwrite> if C<.zero> was absent or failed with C<EOPNOTSUPP>. > > +=head2 C<.can_extents> > + > + int can_extents (void *handle); > + > +This is called during the option negotiation phase to find out if the > +plugin supports detecting allocated (non-sparse) regions of the disk > +with the C<.extents> callback. > + > +If there is an error, C<.can_extents> should call C<nbdkit_error> with > +an error message and return C<-1>. > + > +This callback is not required. If omitted, then we return true iff a > +C<.extents> callback has been defined.Should we default this to returning true always, _because_ we have the sane fallback of treating the entire image as allocated when .extents is absent? (That is, a plugin has to explicitly opt out of advertising NBD_CMD_BLOCK_STATUS support, because our default works well enough even if the plugin didn't do anything). It would match how filters can call next_ops->zero() even for plugins that do not have .zero (because .can_zero defaults to true).> +> +Extents overlapping the range C<[offset...offset+count-1]> should be > +returned if possible. However nbdkit ignores extents E<lt> offset so > +the plugin may, if it is easier to implement, return all extent > +information for the whole disk. The plugin may return extents beyond > +the end of the range. It may also return extent information for less > +than the whole range, but it must return at least one extent.must return at least one extent overlapping with C<[offset]>. (if the plugin returns all extent information prior to offset, but stops before offset is reached, that counts as not returning any extents).> + > +The extents B<must> be added in ascending order, and B<must> be > +contiguous. > + > +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> > +which means that the client is only requesting information about the > +extent overlapping C<offset>. The plugin may ignore this flag, or as > +an optimization it may return just a single extent for C<offset>. > +(Note that implementing this optimization correctly is difficult and > +subtle. You must ensure that the upper limit of the single extent > +returned is correct. If unsure then the safest option is to ignore > +this flag.)I'm not sure we still need this parenthetical - with your simpler implementation, I think it is harder for plugins to get things wrong (the NBD protocol does NOT require the upper limit to be accurate, merely that it makes progress - although the NBD protocol recommends that servers return maximal upper bounds, it also warns clients that they must be prepared to coalesce short returns with identical type).> +++ b/include/nbdkit-filter.h > @@ -1,5 +1,5 @@ > /* nbdkit > - * Copyright (C) 2013-2018 Red Hat Inc. > + * Copyright (C) 2013-2019 Red Hat Inc. > * All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > @@ -44,6 +44,18 @@ extern "C" { > > #define NBDKIT_FILTER_API_VERSION 2 > > +struct nbdkit_extent { > + uint64_t offset; > + uint64_t length; > + uint32_t type; > +}; > + > +extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start); > +extern void nbdkit_extents_free (struct nbdkit_extents *); > +extern size_t nbdkit_extents_size (const struct nbdkit_extents *);'size' makes me wonder if this is the number of bytes covered by the extents object, rather than the number of extent objects in the array. Would calling it 'nbdkit_extents_count' be any easier to understand?> +++ b/server/extents.c > @@ -0,0 +1,171 @@> +struct nbdkit_extents { > + struct nbdkit_extent *extents; > + size_t nr_extents, allocated; > + uint64_t start; > +};Is it also worth tracking cumulative length covered by extents? Tracking it would give us an O(1) instead of O(n) answer to "how many bytes did the plugin tell us about" - which MAY matter to filters that want to append additional data about a hole after the underlying plugin (so the filter can tell if the plugin filled in the entire range that the filter requested, or if it filled in only a subset). But then again, if the filter has to allocate a new map to pass to the plugin, and then postprocess to copy from the temporary map back to the filter's incoming map, the filter is already doing O(n) work and can answer the question itself. (My thought proposal was a way to let such a filter do O(1) work by letting the client append directly into the same list that the filter will also be appending to - but I'm not sure it is necessary for the complexity it introduces). Is it worth tracking a maximum that the client asked for? When REQ_ONE is in effect, the maximum is a hard stop for what we answer to the client (but we should still let the client give more than we asked for); when REQ_ONE is not in effect, the protocol says that we can't add any more extents of a new type (we can only coalesce a larger end length to the existing last extent). Either way, tracking a maximum means that we could avoid wasting time and malloc()s on extents where the plugin gave us more information than we need, for any nbdkit_add_extent() with an offset beyond the maximum that does not coalesce with the existing final extent.> + > +size_t > +nbdkit_extents_size (const struct nbdkit_extents *exts) > +{ > + return exts->nr_extents; > +}Again, would nbdkit_extents_count() be a nicer name for this one?> +int > +nbdkit_add_extent (struct nbdkit_extents *exts, > + uint64_t offset, uint64_t length, uint32_t type) > +{ > + uint64_t overlap; > + > + /* If there are existing extents, the new extent must be contiguous. */or else ignored because it is beyond a maximum, if you like my idea of tracking the maximum the client cared about. Here's also where you added code to ignore 0-length plugin calls.> + > + const struct nbdkit_extent e > + { .offset = offset, .length = length, .type = type }; > + return append_extent (exts, &e);and here is where you added in code to coalesce plugin calls that returned the same type.> +++ b/server/plugins.c > @@ -1,5 +1,5 @@> > +static int > +plugin_can_extents (struct backend *b, struct connection *conn) > +{ > + struct backend_plugin *p = container_of (b, struct backend_plugin, backend); > + > + assert (connection_get_handle (conn, 0)); > + > + debug ("can_extents"); > + > + if (p->plugin.can_extents) > + return p->plugin.can_extents (connection_get_handle (conn, 0)); > + else > + return p->plugin.extents != NULL;Again, making this default to true may be nicer, where a plugin has to opt out of our sane default behavior. We're getting closer. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2019-Mar-25 17:01 UTC
Re: [Libguestfs] [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
On Sat, Mar 23, 2019 at 11:25:06AM -0500, Eric Blake wrote:> On 3/20/19 5:11 PM, Richard W.M. Jones wrote: > > + myfilter_extents (...) > > + { > > + size_t i; > > + struct nbdkit_extents *extents2; > > + struct nbdkit_extent e; > > + > > + extents2 = nbdkit_extents_new (offset); > > + next_ops->extents (nxdata, count, offset, flags, extents2, err); > > + for (i = 0; i < nbdkit_extents_size (extents2); ++i) { > > + e = nbdkit_get_extent (extents2, i); > > + e.offset = /* transform offset */; > > + nbdkit_add_extent (extents, e.offset, e.length, e.type); > > + } > > + nbdkit_extents_free (extents2); > > + } > > + > > +More complicated transformations may require the filter to allocate a > > +new extents list using C<nbdkit_extents_new> and free the old one > > +using C<nbdkit_extents_free>. > > This sentence sounds odd after the example. You'll probably want to > reword this a bit.Yes, that sentence was left over after I changed the example, so it now doesn't make any sense. I'll remove it, and I'm also going to fix the example above it so it shows how to adjust offsets in the returned extents map. In the rest of the email you suggested a different and a bit more complex struct nbdkit_extents implementation. I think that we should concentrate on getting the plugin API right, and at the moment the only API we've defined as safe for plugins is: extern int nbdkit_add_extent (struct nbdkit_extents *, uint64_t offset, uint64_t length, uint32_t type); plus the contract that offsets must be added contiguously and cover the range from <= offset. I think we can refine the filter API later since we are not bound by ABI considerations for filters. Although I do wish we could constrain plugins to only using that API somehow ...> > + struct nbdkit_extents *nbdkit_extents_new (uint64_t start); > > + > > +Allocates and returns a new, empty extents list. The C<start> > > +parameter is the start of the range described in the list. Normally > > +you would pass in C<offset> as this parameter. > > Well, the offset that the plugin should start at (which might not be > your C<offset>, if your filter is adjusting the plugin's offsets)Very true, I've adjusted the text and the example.> > +=head2 C<.can_extents> > > + > > + int can_extents (void *handle); > > + > > +This is called during the option negotiation phase to find out if the > > +plugin supports detecting allocated (non-sparse) regions of the disk > > +with the C<.extents> callback. > > + > > +If there is an error, C<.can_extents> should call C<nbdkit_error> with > > +an error message and return C<-1>. > > + > > +This callback is not required. If omitted, then we return true iff a > > +C<.extents> callback has been defined. > > Should we default this to returning true always, _because_ we have the > sane fallback of treating the entire image as allocated when .extents is > absent? (That is, a plugin has to explicitly opt out of advertising > NBD_CMD_BLOCK_STATUS support, because our default works well enough even > if the plugin didn't do anything). It would match how filters can call > next_ops->zero() even for plugins that do not have .zero (because > .can_zero defaults to true).There is definitely a case for removing can_extents completely, and relying on the fallback. Would this affect language bindings? I think no because in the language bindings we can also emulate the .extents callback. Associated with this change would be to change the server so it always advertises and enables base:allocation (if the client requests it). I'm trying to think if there's any reason not to make both of those changes, and failing to think of a reason at the moment ...> > +Extents overlapping the range C<[offset...offset+count-1]> should be > > +returned if possible. However nbdkit ignores extents E<lt> offset so > > +the plugin may, if it is easier to implement, return all extent > > +information for the whole disk. The plugin may return extents beyond > > +the end of the range. It may also return extent information for less > > +than the whole range, but it must return at least one extent. > > must return at least one extent overlapping with C<[offset]>. > > (if the plugin returns all extent information prior to offset, but stops > before offset is reached, that counts as not returning any extents).Yup, will adjust this text.> > +The extents B<must> be added in ascending order, and B<must> be > > +contiguous. > > + > > +The C<flags> parameter may contain the flag C<NBDKIT_FLAG_REQ_ONE> > > +which means that the client is only requesting information about the > > +extent overlapping C<offset>. The plugin may ignore this flag, or as > > +an optimization it may return just a single extent for C<offset>. > > +(Note that implementing this optimization correctly is difficult and > > +subtle. You must ensure that the upper limit of the single extent > > +returned is correct. If unsure then the safest option is to ignore > > +this flag.) > > I'm not sure we still need this parenthetical - with your simpler > implementation, I think it is harder for plugins to get things wrong > (the NBD protocol does NOT require the upper limit to be accurate, > merely that it makes progress - although the NBD protocol recommends > that servers return maximal upper bounds, it also warns clients that > they must be prepared to coalesce short returns with identical type).Yes, agreed, will remove.> > +++ b/include/nbdkit-filter.h > > @@ -1,5 +1,5 @@ > > /* nbdkit > > - * Copyright (C) 2013-2018 Red Hat Inc. > > + * Copyright (C) 2013-2019 Red Hat Inc. > > * All rights reserved. > > * > > * Redistribution and use in source and binary forms, with or without > > @@ -44,6 +44,18 @@ extern "C" { > > > > #define NBDKIT_FILTER_API_VERSION 2 > > > > +struct nbdkit_extent { > > + uint64_t offset; > > + uint64_t length; > > + uint32_t type; > > +}; > > + > > +extern struct nbdkit_extents *nbdkit_extents_new (uint64_t start); > > +extern void nbdkit_extents_free (struct nbdkit_extents *); > > +extern size_t nbdkit_extents_size (const struct nbdkit_extents *); > > 'size' makes me wonder if this is the number of bytes covered by the > extents object, rather than the number of extent objects in the array. > Would calling it 'nbdkit_extents_count' be any easier to understand?Yup, I'll change this.> > +++ b/server/extents.c > > @@ -0,0 +1,171 @@ > > > +struct nbdkit_extents { > > + struct nbdkit_extent *extents; > > + size_t nr_extents, allocated; > > + uint64_t start; > > +}; > > Is it also worth tracking cumulative length covered by extents? > Tracking it would give us an O(1) instead of O(n) answer to "how many > bytes did the plugin tell us about" - which MAY matter to filters that > want to append additional data about a hole after the underlying pluginI may have misunderstood, but isn't that given by: extents[nr_extents-1].offset + extents[nr_extents-1].length - start ? Anyway at the moment I'm not greatly worried about the implementation of extents or how filters are implemented, as long as we have got the plugin API right. Rich -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Richard W.M. Jones
2019-Mar-25 17:10 UTC
Re: [Libguestfs] [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
On Mon, Mar 25, 2019 at 05:01:39PM +0000, Richard W.M. Jones wrote:> > > +=head2 C<.can_extents>[...]> There is definitely a case for removing can_extents completely, and > relying on the fallback. Would this affect language bindings? I > think no because in the language bindings we can also emulate the > .extents callback. > > Associated with this change would be to change the server so it always > advertises and enables base:allocation (if the client requests it). > > I'm trying to think if there's any reason not to make both of those > changes, and failing to think of a reason at the moment ...Actually I can think of one reason we might want to keep can_extents, and that's the case where a plugin may be able to implement extents in some cases but not others (the file plugin in the current series is an example). However there is still the question of if we should always advertise base:allocation, and I think the answer there is yes we should always advertise that (to clients that ask for it). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Eric Blake
2019-Mar-25 17:23 UTC
Re: [Libguestfs] [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
On 3/25/19 12:01 PM, Richard W.M. Jones wrote:>>> +This callback is not required. If omitted, then we return true iff a >>> +C<.extents> callback has been defined. >> >> Should we default this to returning true always, _because_ we have the >> sane fallback of treating the entire image as allocated when .extents is >> absent? (That is, a plugin has to explicitly opt out of advertising >> NBD_CMD_BLOCK_STATUS support, because our default works well enough even >> if the plugin didn't do anything). It would match how filters can call >> next_ops->zero() even for plugins that do not have .zero (because >> .can_zero defaults to true). > > There is definitely a case for removing can_extents completely, and > relying on the fallback. Would this affect language bindings? I > think no because in the language bindings we can also emulate the > .extents callback. > > Associated with this change would be to change the server so it always > advertises and enables base:allocation (if the client requests it). > > I'm trying to think if there's any reason not to make both of those > changes, and failing to think of a reason at the moment ...The 'fua' filter is a case where not advertising something, for the purpose of timing comparisons (basically, comparisons of whether it is more efficient for the client to always be told that the entire image is allocated, vs. the client having to assume that the entire image is allocated because it can't even query), is the only reason I can think of for why we would want to let a filter/plugin forcefully tell nbdkit to not advertise block status.>>> +++ b/server/extents.c >>> @@ -0,0 +1,171 @@ >> >>> +struct nbdkit_extents { >>> + struct nbdkit_extent *extents; >>> + size_t nr_extents, allocated; >>> + uint64_t start; >>> +}; >> >> Is it also worth tracking cumulative length covered by extents? >> Tracking it would give us an O(1) instead of O(n) answer to "how many >> bytes did the plugin tell us about" - which MAY matter to filters that >> want to append additional data about a hole after the underlying plugin > > I may have misunderstood, but isn't that given by: > extents[nr_extents-1].offset + extents[nr_extents-1].length - start ?Oh, good call. So as long as .offset is part of each nbdkit_extent, yes, a filter can compute total length in O(1) instead of O(n). (If nbdkit_extent tracks only length, then you have to iterate over all n extents to add the cumulative length back up).> > Anyway at the moment I'm not greatly worried about the implementation > of extents or how filters are implemented, as long as we have got the > plugin API right.I also think you are right that we can change the implementation of extents without hurting plugins (even if we change the content of struct nbdkit_extent, only filters use that struct). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Seemingly Similar Threads
- [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
- Re: [PATCH nbdkit 1/8] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit v4 01/15] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit v5 FINAL 01/19] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit 4/8] offset: Implement mapping of extents.