thr3ads.net - Libguestfs - Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters. [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2019-Mar-12 18:58 UTC

Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

On 3/12/19 12:11 PM, Richard W.M. Jones wrote:> This pair of calls allows plugins to describe which extents in the
> virtual disk are allocated, holes or zeroes.
> ---
> +++ b/docs/nbdkit-filter.pod
> +The C<extents_map> parameter passed to this function is empty.
True only if any earlier filter also passed in an empty map. Maybe it's
worth explicitly mentioning that the filter must in turn pass an empty
map if it calls next_ops?
>  If the
> +filter does not need to adjust extents from the underlying plugin it
> +can simply pass this through to the layer below:
> +
> + myfilter_extents (...)
> + {
> +   return next_ops->extents (nxdata, count, offset, flags, extents_map,
err);
> + }
> +
> +It is also possible for filters to transform the extents map received
> +back from the layer below.  Usually this must be done by allocating a
> +new map which is passed to the layer below.  Without error handling it
> +would look like this:
> +
> + myfilter_extents (...)
> + {
> +   struct nbdkit_extents_map *map2 = nbdkit_extents_new ();
> +   next_ops->extents (nxdata, count, offset, flags, map2, err);
> +   /* transform map2 and return results in extents_map */
> +   nbdkit_extents_foreach (map2, transform_offset, extents_map);
> +   nbdkit_extents_free (map2);
> + }
And the fact that we can easily call nbdkit_extents_new() as needed
means that a filter could, for example, call next_ops->extents() even
for its pread implementation (of course, only for a plugin that
.can_extents), if the filter knowing which parts of the plugin are
sparse can make the filter run more efficiently on calls other than
NBD_CMD_BLOCK_STATUS.

> +=head2 C<.extents>
> +
> + int extents (void *handle, uint32_t count, uint64_t offset,
> +              uint32_t flags,
> +              struct nbdkit_extents_map *extents_map);
> +
> +During the data serving phase, this callback is used to
> +detect allocated (non-sparse) regions of the disk.
> +
> +This function will not be called if C<.can_extents> returned false.
Is it worth mentioning that nbdkit's default behavior is to treat the
entire image as allocated non-zero if .can_extents returns false?
> +
> +The callback should detect and return the list of extents overlapping
> +the range C<[offset...offset+count-1]>.  The C<extents_map>
parameter
> +points to an opaque object which the callback should fill in by
> +calling C<nbdkit_extent_add> etc.  See L</Extents map> below.
> +
> +The C<flags> parameter may contain the flag
C<NBDKIT_FLAG_REQ_ONE>
> +which means that the client is only requesting information about the
> +extent overlapping C<offset>.  The plugin may ignore this flag, or
as
> +an optimization it may return just a single extent for C<offset>.
> +
> +If there is an error, C<.extents> should call C<nbdkit_error>
with an
> +error message, and C<nbdkit_set_error> to record an appropriate
error
> +(unless C<errno> is sufficient), then return C<-1>.
> +
> +=head3 Extents map
> +
> +The plugin C<extents> callback is passed an opaque pointer
C<struct
> +nbdkit_extents_map *extents_map>.  This structure represents a list of
> +L<filesystem
extents|https://en.wikipedia.org/wiki/Extent_(file_systems)>
> +describing which areas of the disk are allocated, which are sparse
> +(“holes”), and, if supported, which are zeroes.  Initially the map
> +describes a single hole covering the entire virtual disk.
Is an initial hole the wisest default, or should the initial state be
data, and the plugin can do an nbdkit_extent_add(0, size, HOLE) if the
client finds it easier to start with all holes? After all, the NBD
protocol states that returning data/allocated is always the safest
default (returning hole and/or read-zeroes should only be done when we
are sure about it, but returning data/allocated is fine even if it
causes the client to be less efficient for reading holes that it could
have otherwise skipped).
> +
> +The C<extents> callback has to fill in any allocated or zeroes areas
> +by calling C<nbdkit_extent*> functions, to describe all extents
> +overlapping the range C<[offset...offset+count-1]>.  (nbdkit will
> +ignore extents outside this range so plugins may, if it is easier to
> +implement, return all extent information for the whole disk.)
Is it worth mentioning that speed is slightly more important than
accuracy (that is, it's better to quickly return DATA than to slowly
read the are in question and memcmp() to see if it is all zeros in order
to return a more precise ZERO; ideally, returning anything other than
DATA should only be done when the time collecting that information is
proportionate to the number of requests rather to the size of the region
being queried).

Is nbdkit allowed to cache the information learned from the plugin for
regions outside the client's initial query for later use, or are we
better off always asking for information again (even if the answer we
get will be the same)?
> +
> +The following functions are available for plugins or filters to call:
> +
> + int nbdkit_extent_add (struct nbdkit_extents_map *extents_map,
> +                        uint64_t offset, uint64_t length, uint32_t type);
> +
> +Add an extent covering C<[offset...offset+length-1]> of one of
> +the following types:
> +
> +=over 4
> +
> +=item NBDKIT_EXTENT_TYPE_HOLE
> +
> +An unallocated extent, a.k.a. a “hole”.
> +
> +=item NBDKIT_EXTENT_TYPE_DATA
> +
> +A normal extent containing data.
> +
> +=item NBDKIT_EXTENT_TYPE_ZERO
> +
> +An extent which is allocated and contains all zero bytes.
The protocol allows all four combinations of the two bits (you could
have a non-allocated area not known to read as zeroes, for example, some
SCSI disks have this property); should we likewise have four potential
different types, rather than just three?
> +
> +=back
> +
> +If the new extent overlaps some previous extent in the map then the
> +overlapping part(s) are overwritten with the new type.  Adjacent
> +extents of the same type may also be coalesced.  Thus you shouldn't
> +assume that C<nbdkit_extent_add> always adds exactly one more extent
> +to the map.
That's a nice aspect - it means a client CAN call add(0, full_size,
HOLE) and then refine it with subsequent add(offset, length, DATA). It
also seems like a fairly straightforward implementation - keep a sorted
list of extents, and then split/coalescse into the right sorted location
as more add() requests come in.  Then it is an implementation detail
whether we track the list by array, linked list, or binary tree, which
in turn may be determined by how many extents we expect to be dealing
with (choosing an implementation with O(log n) lookup/insertion rather
than O(n) in case the client gives us LOTS of add() calls may suggest
that a tree is best, but prototyping with an array for the initial cut
is less code complexity).
> +
> +C<nbdkit_extent_add> returns C<0> on success or C<-1> on
failure.  On
> +failure C<nbdkit_error> and/or C<nbdkit_set_error> has already
been
> +called.
> +
> + void nbdkit_extents_clear (struct nbdkit_extents_map *extents_map);
> +
> +Clear the extents map.  After this the map will have a single hole
> +covering the entire disk.
> +
> + int nbdkit_extents_foreach (const struct nbdkit_extents_map *extents_map,
> +                             int (*fn) (uint64_t offset, uint64_t length,
> +                                        uint32_t type, void *opaque),
> +                             void *opaque);
> +
> +Iterate over the extents in order, calling C<fn> for each. 
C<fn>
> +should return 0 on success or -1 on failure.  If a call to C<fn>
fails
> +then C<nbdkit_extents_foreach> also fails immediately.
Question - should foreach allow the caller to pass in bounds? That is,
since the plugin can populate 0 - full_size, but where we know the
client only asked for information on offset - length, it may be more
efficient to visit just the extents lying within those bounds. (Maybe
allow -1 as a shortcut rather than an actual end size, when the ending
offset is less important).

Is it worth an explicit guarantee that foreach() visits extents in
ascending order, regardless of whether add() was called out of order?
(Ties us to our implementation where we split/coalesce per add - but
seems like it can make life easier for filters if we do have that
guarantee).
> +
> +C<nbdkit_extents_foreach> returns C<0> on success or
C<-1> on failure.
> +However it does not call C<nbdkit_error> or
C<nbdkit_set_error> - it
> +is expected that C<fn> will call those functions on failure.
either the failing C<fn>, or the caller of foreach().

No documentation of nbdkit_extents_{new,free}?
>  
> +#define NBDKIT_EXTENT_TYPE_HOLE 0
> +#define NBDKIT_EXTENT_TYPE_DATA 1
> +#define NBDKIT_EXTENT_TYPE_ZERO 2
> +
Repeating the question above about whether to expose all four bit
combinations possible in the NBD protocol.

> +++ b/server/extents.c
> @@ -0,0 +1,67 @@
> +struct nbdkit_extents_map {
> +  // TBD XXX
> +};
Okay, so we're still in the design phase :)

> +static int
> +plugin_extents (struct backend *b, struct connection *conn,
> +                uint32_t count, uint64_t offset, uint32_t flags,
> +                struct nbdkit_extents_map *extents_map,
> +                int *err)
> +{
> +  struct backend_plugin *p = container_of (b, struct backend_plugin,
backend);
> +  bool req_one = flags & NBDKIT_FLAG_REQ_ONE;
> +  int r;
> +  int can_extents = 0; /* XXX probably should be cached per connection */
There's probably a lot of the .can_FOO calls that should be cached per
connection.

Still, it's looking fairly reasonable.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2019-Mar-12 19:52 UTC

head link

Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

On Tue, Mar 12, 2019 at 01:58:52PM -0500, Eric Blake
wrote:> On 3/12/19 12:11 PM, Richard W.M. Jones wrote:
> > This pair of calls allows plugins to describe which extents in the
> > virtual disk are allocated, holes or zeroes.
> > ---
> 
> > +++ b/docs/nbdkit-filter.pod
> 
> > +The C<extents_map> parameter passed to this function is empty.
> 
> True only if any earlier filter also passed in an empty map. Maybe it's
> worth explicitly mentioning that the filter must in turn pass an empty
> map if it calls next_ops?
This is true and the text should be clarified.  However I'm having a
hard time thinking of a case where this would be useful.
> >  If the
> > +filter does not need to adjust extents from the underlying plugin it
> > +can simply pass this through to the layer below:
> > +
> > + myfilter_extents (...)
> > + {
> > +   return next_ops->extents (nxdata, count, offset, flags,
extents_map, err);
> > + }
> > +
> > +It is also possible for filters to transform the extents map received
> > +back from the layer below.  Usually this must be done by allocating a
> > +new map which is passed to the layer below.  Without error handling
it
> > +would look like this:
> > +
> > + myfilter_extents (...)
> > + {
> > +   struct nbdkit_extents_map *map2 = nbdkit_extents_new ();
> > +   next_ops->extents (nxdata, count, offset, flags, map2, err);
> > +   /* transform map2 and return results in extents_map */
> > +   nbdkit_extents_foreach (map2, transform_offset, extents_map);
> > +   nbdkit_extents_free (map2);
> > + }
> 
> And the fact that we can easily call nbdkit_extents_new() as needed
> means that a filter could, for example, call next_ops->extents() even
> for its pread implementation (of course, only for a plugin that
> .can_extents), if the filter knowing which parts of the plugin are
> sparse can make the filter run more efficiently on calls other than
> NBD_CMD_BLOCK_STATUS.
Agreed.
> > +=head2 C<.extents>
> > +
> > + int extents (void *handle, uint32_t count, uint64_t offset,
> > +              uint32_t flags,
> > +              struct nbdkit_extents_map *extents_map);
> > +
> > +During the data serving phase, this callback is used to
> > +detect allocated (non-sparse) regions of the disk.
> > +
> > +This function will not be called if C<.can_extents> returned
false.
> 
> Is it worth mentioning that nbdkit's default behavior is to treat the
> entire image as allocated non-zero if .can_extents returns false?
Yes I'll clarify this too.
> > +The callback should detect and return the list of extents overlapping
> > +the range C<[offset...offset+count-1]>.  The
C<extents_map> parameter
> > +points to an opaque object which the callback should fill in by
> > +calling C<nbdkit_extent_add> etc.  See L</Extents map>
below.
> > +
> > +The C<flags> parameter may contain the flag
C<NBDKIT_FLAG_REQ_ONE>
> > +which means that the client is only requesting information about the
> > +extent overlapping C<offset>.  The plugin may ignore this flag,
or as
> > +an optimization it may return just a single extent for
C<offset>.
> > +
> > +If there is an error, C<.extents> should call
C<nbdkit_error> with an
> > +error message, and C<nbdkit_set_error> to record an appropriate
error
> > +(unless C<errno> is sufficient), then return C<-1>.
> > +
> > +=head3 Extents map
> > +
> > +The plugin C<extents> callback is passed an opaque pointer
C<struct
> > +nbdkit_extents_map *extents_map>.  This structure represents a
list of
> > +L<filesystem
extents|https://en.wikipedia.org/wiki/Extent_(file_systems)>
> > +describing which areas of the disk are allocated, which are sparse
> > +(“holes”), and, if supported, which are zeroes.  Initially the map
> > +describes a single hole covering the entire virtual disk.
> 
> Is an initial hole the wisest default, or should the initial state be
> data, and the plugin can do an nbdkit_extent_add(0, size, HOLE) if the
> client finds it easier to start with all holes? After all, the NBD
> protocol states that returning data/allocated is always the safest
> default (returning hole and/or read-zeroes should only be done when we
> are sure about it, but returning data/allocated is fine even if it
> causes the client to be less efficient for reading holes that it could
> have otherwise skipped).
The idea behind making hole the default is that it makes the VDDK
plugin slightly easier to write.  However the VDDK plugin as you say
could just call nbdkit_extents_clear() in this case.

The other argument for making hole default is that it makes the
_extents_map_ easier to write (as a hole would be implemented as an
empty map).

I take the point though and will think about this some more.
> > +The C<extents> callback has to fill in any allocated or zeroes
areas
> > +by calling C<nbdkit_extent*> functions, to describe all extents
> > +overlapping the range C<[offset...offset+count-1]>.  (nbdkit
will
> > +ignore extents outside this range so plugins may, if it is easier to
> > +implement, return all extent information for the whole disk.)
> 
> Is it worth mentioning that speed is slightly more important than
> accuracy (that is, it's better to quickly return DATA than to slowly
> read the are in question and memcmp() to see if it is all zeros in order
> to return a more precise ZERO; ideally, returning anything other than
> DATA should only be done when the time collecting that information is
> proportionate to the number of requests rather to the size of the region
> being queried).
Yes will clarify.
> Is nbdkit allowed to cache the information learned from the plugin for
> regions outside the client's initial query for later use, or are we
> better off always asking for information again (even if the answer we
> get will be the same)?
I'm inclined not to cache things because it's a premature optimization
and it makes the functioning of the server hard to reason about.  If a
particular operation is slow we can optimize it later.
> > +
> > +The following functions are available for plugins or filters to call:
> > +
> > + int nbdkit_extent_add (struct nbdkit_extents_map *extents_map,
> > +                        uint64_t offset, uint64_t length, uint32_t
type);
> > +
> > +Add an extent covering C<[offset...offset+length-1]> of one of
> > +the following types:
> > +
> > +=over 4
> > +
> > +=item NBDKIT_EXTENT_TYPE_HOLE
> > +
> > +An unallocated extent, a.k.a. a “hole”.
> > +
> > +=item NBDKIT_EXTENT_TYPE_DATA
> > +
> > +A normal extent containing data.
> > +
> > +=item NBDKIT_EXTENT_TYPE_ZERO
> > +
> > +An extent which is allocated and contains all zero bytes.
> 
> The protocol allows all four combinations of the two bits (you could
> have a non-allocated area not known to read as zeroes, for example, some
> SCSI disks have this property); should we likewise have four potential
> different types, rather than just three?
Interesting, I didn't know we would support that, but yes.
> > +=back
> > +
> > +If the new extent overlaps some previous extent in the map then the
> > +overlapping part(s) are overwritten with the new type.  Adjacent
> > +extents of the same type may also be coalesced.  Thus you
shouldn't
> > +assume that C<nbdkit_extent_add> always adds exactly one more
extent
> > +to the map.
> 
> That's a nice aspect - it means a client CAN call add(0, full_size,
> HOLE) and then refine it with subsequent add(offset, length, DATA). It
> also seems like a fairly straightforward implementation - keep a sorted
> list of extents, and then split/coalescse into the right sorted location
> as more add() requests come in.  Then it is an implementation detail
> whether we track the list by array, linked list, or binary tree, which
> in turn may be determined by how many extents we expect to be dealing
> with (choosing an implementation with O(log n) lookup/insertion rather
> than O(n) in case the client gives us LOTS of add() calls may suggest
> that a tree is best, but prototyping with an array for the initial cut
> is less code complexity).
Agreed.
> > +
> > +C<nbdkit_extent_add> returns C<0> on success or
C<-1> on failure.  On
> > +failure C<nbdkit_error> and/or C<nbdkit_set_error> has
already been
> > +called.
> > +
> > + void nbdkit_extents_clear (struct nbdkit_extents_map *extents_map);
> > +
> > +Clear the extents map.  After this the map will have a single hole
> > +covering the entire disk.
> > +
> > + int nbdkit_extents_foreach (const struct nbdkit_extents_map
*extents_map,
> > +                             int (*fn) (uint64_t offset, uint64_t
length,
> > +                                        uint32_t type, void *opaque),
> > +                             void *opaque);
> > +
> > +Iterate over the extents in order, calling C<fn> for each. 
C<fn>
> > +should return 0 on success or -1 on failure.  If a call to
C<fn> fails
> > +then C<nbdkit_extents_foreach> also fails immediately.
> 
> Question - should foreach allow the caller to pass in bounds? That is,
> since the plugin can populate 0 - full_size, but where we know the
> client only asked for information on offset - length, it may be more
> efficient to visit just the extents lying within those bounds. (Maybe
> allow -1 as a shortcut rather than an actual end size, when the ending
> offset is less important).
Yes it's a good idea.
> Is it worth an explicit guarantee that foreach() visits extents in
> ascending order, regardless of whether add() was called out of order?
> (Ties us to our implementation where we split/coalesce per add - but
> seems like it can make life easier for filters if we do have that
> guarantee).
Yes, it should always call them "in order" (as it says).  Maybe this
isn't clear?  Perhaps I should say "in offset order" or something
like
that.
> > +
> > +C<nbdkit_extents_foreach> returns C<0> on success or
C<-1> on failure.
> > +However it does not call C<nbdkit_error> or
C<nbdkit_set_error> - it
> > +is expected that C<fn> will call those functions on failure.
> 
> either the failing C<fn>, or the caller of foreach().
Yes, will update.
> No documentation of nbdkit_extents_{new,free}?
Yeah I didn't want to document them in nbdkit-plugin.pod, but then
forgot to document them in nbdkit-filter.pod.  I believe they should
never be called by plugins (although that's difficult to enforce).

[...]

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Eric Blake

2019-Mar-12 20:24 UTC

head link

Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

On 3/12/19 2:52 PM, Richard W.M. Jones wrote:
>>> +
>>> +Iterate over the extents in order, calling C<fn> for each. 
C<fn>
>>> +should return 0 on success or -1 on failure.  If a call to
C<fn> fails
>>> +then C<nbdkit_extents_foreach> also fails immediately.
>>
>> Question - should foreach allow the caller to pass in bounds? That is,
>> since the plugin can populate 0 - full_size, but where we know the
>> client only asked for information on offset - length, it may be more
>> efficient to visit just the extents lying within those bounds. (Maybe
>> allow -1 as a shortcut rather than an actual end size, when the ending
>> offset is less important).
> 
> Yes it's a good idea.
It might also be worth a bool parameter to make it easier to implement
the REQ_ONE flag by stopping the iteration after a single call to <fn>,
regardless of whether fn retured 0 or -1? (But there, it's also fairly
easy to write fn to return -1 without calling nbdkit_error, and to use
opaque as a struct that contains the real error code to know whether the
foreach() returning -1 is fatal and worth raising an error or just an
early success path).

If an empty map has a useful default (whether you decide for a default
of HOLE vs. DATA), I suppose that means foreach() will always call fn at
least once for any given bounds, even when the plugin didn't call _add?
> 
>> Is it worth an explicit guarantee that foreach() visits extents in
>> ascending order, regardless of whether add() was called out of order?
>> (Ties us to our implementation where we split/coalesce per add - but
>> seems like it can make life easier for filters if we do have that
>> guarantee).
> 
> Yes, it should always call them "in order" (as it says).  Maybe
this
> isn't clear?  Perhaps I should say "in offset order" or
something like
> that.
"in offset order" helps (as distinct from "in _add() order")
> 
>> No documentation of nbdkit_extents_{new,free}?
> 
> Yeah I didn't want to document them in nbdkit-plugin.pod, but then
> forgot to document them in nbdkit-filter.pod.  I believe they should
> never be called by plugins (although that's difficult to enforce).
You could enforce by having _new/_free only declared in nbdkit-filter.h
while the others are in nbdkit-common.h. (a plugin can't easily call a
function that isn't declared). But I don't know if we have to go to that
much effort, or go with the simpler use of a comment in nbdkit-common.h
that states something like "these functions are only intended for use in
filters".

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Seemingly Similar Threads

Search for more possibly parallel threads

Libguestfs - Mar 2019 - Re: [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

Re: [Libguestfs] [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.

Seemingly Similar Threads