thr3ads.net - Libguestfs - Re: [Libguestfs] New extents structure proposal [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2019-Mar-20 15:31 UTC

[Libguestfs] New extents structure proposal

I think the extents map is just too complicated and is unnecessarily
so.  How about instead we define the plugin interface to be:

  int can_extents (void *handle);  // as before
  int extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags,
               struct nbdkit_extents_list *list);

and have the extents_list be a simple list.  The first extent you add
must start at offset.  And you're only allowed to append monotonically
increasing adjacent extents to it.  Plugins must return at least one
extent, but are not required to return more than one extent
(regardless of flags).

This would be simple enough to implement for both VDDK and file.  (For
VDDK the plugin needs to synthesize the holes but that's only a slight
complication).

NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the
extents, stop after generating the first one; if it's the server
processing the extents, take the first extent in the list).

The ‘end’ that we discussed before is now implicit (it's the byte of
the last extent in the list).

Also an observation: qemu's nbd client only ever issues block status
requests with the req-one flag set, so perhaps we should optimize for
that case.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Eric Blake

2019-Mar-20 16:20 UTC

head link

Re: [Libguestfs] New extents structure proposal

On 3/20/19 10:31 AM, Richard W.M. Jones wrote:> I think the extents map is just too complicated and is unnecessarily
> so.  How about instead we define the plugin interface to be:
> 
>   int can_extents (void *handle);  // as before
>   int extents (void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
>                struct nbdkit_extents_list *list);
> 
> and have the extents_list be a simple list.  The first extent you add
> must start at offset.  And you're only allowed to append monotonically
> increasing adjacent extents to it.  Plugins must return at least one
> extent, but are not required to return more than one extent
> (regardless of flags).
Well, required to return at least one for the command to succeed (if the
return 0, then we synthesize an error, perhaps EIO, to keep the
connection to the client still alive).

I take it you'd still have to use an nbdkit_extent_add(list, count,
type), so that additions to the list are still calling malloc() from
nbdkit's perspective (as you don't want the plugin to be realloc()ing
the list)?
> 
> This would be simple enough to implement for both VDDK and file.  (For
> VDDK the plugin needs to synthesize the holes but that's only a slight
> complication).
> 
> NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the
> extents, stop after generating the first one; if it's the server
> processing the extents, take the first extent in the list).
> 
> The ‘end’ that we discussed before is now implicit (it's the byte of
> the last extent in the list).
So a client may give us more extents than we need, but can also easily
stop giving us extents as early as they want (and as long as they gave
us at least one, the overall call is successful).  The requirement on
the plugin giving us extents in monotonically increasing order is not
too onerous; all your earlier implementations were naturally already
doing that (in other words, the full flexibility of random-access extent
probing rather than linear is over-engineered).

I like the idea.

I'm trying to figure out if filters would still need a foreach visitor.
But my initial guess is no.  Consider - the earlier proposal required
the offset filter to do:

+  struct nbdkit_extents_map *map2;
+
+  map2 = nbdkit_extents_new ();
+  if (map2 == NULL)
+    return -1;
+  if (next_ops->extents (nxdata, count, offs + offset,
+                         flags, map2, err) == -1) {
+    nbdkit_extents_free (map2);
+    return -1;
+  }
+
+  /* Transform offsets in map2, return result in extents_map. */
+  if (nbdkit_extents_foreach (map2, subtract_offset, extents_map,
+                              NBDKIT_EXTENTS_FOREACH_FLAG_RANGE,
+                              offset, range) == -1) {
+    nbdkit_extents_free (map2);
+    return -1;
+  }
+  nbdkit_extents_free (map2);

because the plugin was calling:

+    if (nbdkit_extent_add (extents_map, offset, n, type) == -1)

But if we drop offset from the interface, the offset filter simplifies to:

return next_ops->extents (nxdata, count, offs + offset, flags, list, err);

because the plugin will be calling:

if (nbdkit_extent_add(list, n type) == -1)

> 
> Also an observation: qemu's nbd client only ever issues block status
> requests with the req-one flag set, so perhaps we should optimize for
> that case.
I hope to get to the point where future qemu doesn't send the req-one
flag. There's several threads on the qemu list about how qemu-img is
slower than it needs to be because it is throwing away useful
information, and where it is aggravated by the kernel's abyssmal lseek()
performance on tmpfs.  But until qemu learns useful caching, you're
right that most existing NBD clients that request block status do so one
extent at a time (because I don't know of any other existing NBD clients
that use BLOCK_STATUS yet).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2019-Mar-20 16:57 UTC

head link

Re: [Libguestfs] New extents structure proposal

On Wed, Mar 20, 2019 at 11:20:15AM -0500, Eric Blake
wrote:> On 3/20/19 10:31 AM, Richard W.M. Jones wrote:
> > I think the extents map is just too complicated and is unnecessarily
> > so.  How about instead we define the plugin interface to be:
> > 
> >   int can_extents (void *handle);  // as before
> >   int extents (void *handle, uint32_t count, uint64_t offset, uint32_t
flags,
> >                struct nbdkit_extents_list *list);
> > 
> > and have the extents_list be a simple list.  The first extent you add
> > must start at offset.  And you're only allowed to append
monotonically
> > increasing adjacent extents to it.  Plugins must return at least one
> > extent, but are not required to return more than one extent
> > (regardless of flags).
> 
> Well, required to return at least one for the command to succeed (if the
> return 0, then we synthesize an error, perhaps EIO, to keep the
> connection to the client still alive).
> 
> I take it you'd still have to use an nbdkit_extent_add(list, count,
> type), so that additions to the list are still calling malloc() from
> nbdkit's perspective (as you don't want the plugin to be
realloc()ing
> the list)?
Yes I think we should preserve an abstract data type, not least
because it would allow us to implement a different structure in
future.
> > This would be simple enough to implement for both VDDK and file.  (For
> > VDDK the plugin needs to synthesize the holes but that's only a
slight
> > complication).
> > 
> > NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the
> > extents, stop after generating the first one; if it's the server
> > processing the extents, take the first extent in the list).
> > 
> > The ‘end’ that we discussed before is now implicit (it's the byte
of
> > the last extent in the list).
> 
> So a client may give us more extents than we need, but can also easily
> stop giving us extents as early as they want (and as long as they gave
> us at least one, the overall call is successful).  The requirement on
> the plugin giving us extents in monotonically increasing order is not
> too onerous; all your earlier implementations were naturally already
> doing that (in other words, the full flexibility of random-access extent
> probing rather than linear is over-engineered).
> 
> I like the idea.
> 
> I'm trying to figure out if filters would still need a foreach visitor.
> But my initial guess is no.  Consider - the earlier proposal required
> the offset filter to do:
> 
> +  struct nbdkit_extents_map *map2;
> +
> +  map2 = nbdkit_extents_new ();
> +  if (map2 == NULL)
> +    return -1;
> +  if (next_ops->extents (nxdata, count, offs + offset,
> +                         flags, map2, err) == -1) {
> +    nbdkit_extents_free (map2);
> +    return -1;
> +  }
> +
> +  /* Transform offsets in map2, return result in extents_map. */
> +  if (nbdkit_extents_foreach (map2, subtract_offset, extents_map,
> +                              NBDKIT_EXTENTS_FOREACH_FLAG_RANGE,
> +                              offset, range) == -1) {
> +    nbdkit_extents_free (map2);
> +    return -1;
> +  }
> +  nbdkit_extents_free (map2);
> 
> because the plugin was calling:
> 
> +    if (nbdkit_extent_add (extents_map, offset, n, type) == -1)
> 
> But if we drop offset from the interface, the offset filter simplifies to:
> 
> return next_ops->extents (nxdata, count, offs + offset, flags, list,
err);
> 
> because the plugin will be calling:
> 
> if (nbdkit_extent_add(list, n type) == -1)
For a simple case like this we might even allow filters to simply
update the offsets in the list.  It requires exposing the inner
working of the data type to filters though.
> > Also an observation: qemu's nbd client only ever issues block
status
> > requests with the req-one flag set, so perhaps we should optimize for
> > that case.
> 
> I hope to get to the point where future qemu doesn't send the req-one
> flag. There's several threads on the qemu list about how qemu-img is
> slower than it needs to be because it is throwing away useful
> information, and where it is aggravated by the kernel's abyssmal
lseek()
> performance on tmpfs.  But until qemu learns useful caching, you're
> right that most existing NBD clients that request block status do so one
> extent at a time (because I don't know of any other existing NBD
clients
> that use BLOCK_STATUS yet).
Is it ever possible to cache block status results?  What happens in
the (admittedly unusual) case where two writers are hitting the same
NBD server?  For example if the server is implementing a cluster
filesystem.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Possibly Parallel Threads

Search for more apparently analagous threads

Libguestfs - Mar 2019 - Re: New extents structure proposal

[Libguestfs] New extents structure proposal

Re: [Libguestfs] New extents structure proposal

Re: [Libguestfs] New extents structure proposal

Possibly Parallel Threads