I think the extents map is just too complicated and is unnecessarily so. How about instead we define the plugin interface to be: int can_extents (void *handle); // as before int extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, struct nbdkit_extents_list *list); and have the extents_list be a simple list. The first extent you add must start at offset. And you're only allowed to append monotonically increasing adjacent extents to it. Plugins must return at least one extent, but are not required to return more than one extent (regardless of flags). This would be simple enough to implement for both VDDK and file. (For VDDK the plugin needs to synthesize the holes but that's only a slight complication). NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the extents, stop after generating the first one; if it's the server processing the extents, take the first extent in the list). The ‘end’ that we discussed before is now implicit (it's the byte of the last extent in the list). Also an observation: qemu's nbd client only ever issues block status requests with the req-one flag set, so perhaps we should optimize for that case. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
On 3/20/19 10:31 AM, Richard W.M. Jones wrote:> I think the extents map is just too complicated and is unnecessarily > so. How about instead we define the plugin interface to be: > > int can_extents (void *handle); // as before > int extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, > struct nbdkit_extents_list *list); > > and have the extents_list be a simple list. The first extent you add > must start at offset. And you're only allowed to append monotonically > increasing adjacent extents to it. Plugins must return at least one > extent, but are not required to return more than one extent > (regardless of flags).Well, required to return at least one for the command to succeed (if the return 0, then we synthesize an error, perhaps EIO, to keep the connection to the client still alive). I take it you'd still have to use an nbdkit_extent_add(list, count, type), so that additions to the list are still calling malloc() from nbdkit's perspective (as you don't want the plugin to be realloc()ing the list)?> > This would be simple enough to implement for both VDDK and file. (For > VDDK the plugin needs to synthesize the holes but that's only a slight > complication). > > NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the > extents, stop after generating the first one; if it's the server > processing the extents, take the first extent in the list). > > The ‘end’ that we discussed before is now implicit (it's the byte of > the last extent in the list).So a client may give us more extents than we need, but can also easily stop giving us extents as early as they want (and as long as they gave us at least one, the overall call is successful). The requirement on the plugin giving us extents in monotonically increasing order is not too onerous; all your earlier implementations were naturally already doing that (in other words, the full flexibility of random-access extent probing rather than linear is over-engineered). I like the idea. I'm trying to figure out if filters would still need a foreach visitor. But my initial guess is no. Consider - the earlier proposal required the offset filter to do: + struct nbdkit_extents_map *map2; + + map2 = nbdkit_extents_new (); + if (map2 == NULL) + return -1; + if (next_ops->extents (nxdata, count, offs + offset, + flags, map2, err) == -1) { + nbdkit_extents_free (map2); + return -1; + } + + /* Transform offsets in map2, return result in extents_map. */ + if (nbdkit_extents_foreach (map2, subtract_offset, extents_map, + NBDKIT_EXTENTS_FOREACH_FLAG_RANGE, + offset, range) == -1) { + nbdkit_extents_free (map2); + return -1; + } + nbdkit_extents_free (map2); because the plugin was calling: + if (nbdkit_extent_add (extents_map, offset, n, type) == -1) But if we drop offset from the interface, the offset filter simplifies to: return next_ops->extents (nxdata, count, offs + offset, flags, list, err); because the plugin will be calling: if (nbdkit_extent_add(list, n type) == -1)> > Also an observation: qemu's nbd client only ever issues block status > requests with the req-one flag set, so perhaps we should optimize for > that case.I hope to get to the point where future qemu doesn't send the req-one flag. There's several threads on the qemu list about how qemu-img is slower than it needs to be because it is throwing away useful information, and where it is aggravated by the kernel's abyssmal lseek() performance on tmpfs. But until qemu learns useful caching, you're right that most existing NBD clients that request block status do so one extent at a time (because I don't know of any other existing NBD clients that use BLOCK_STATUS yet). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Wed, Mar 20, 2019 at 11:20:15AM -0500, Eric Blake wrote:> On 3/20/19 10:31 AM, Richard W.M. Jones wrote: > > I think the extents map is just too complicated and is unnecessarily > > so. How about instead we define the plugin interface to be: > > > > int can_extents (void *handle); // as before > > int extents (void *handle, uint32_t count, uint64_t offset, uint32_t flags, > > struct nbdkit_extents_list *list); > > > > and have the extents_list be a simple list. The first extent you add > > must start at offset. And you're only allowed to append monotonically > > increasing adjacent extents to it. Plugins must return at least one > > extent, but are not required to return more than one extent > > (regardless of flags). > > Well, required to return at least one for the command to succeed (if the > return 0, then we synthesize an error, perhaps EIO, to keep the > connection to the client still alive). > > I take it you'd still have to use an nbdkit_extent_add(list, count, > type), so that additions to the list are still calling malloc() from > nbdkit's perspective (as you don't want the plugin to be realloc()ing > the list)?Yes I think we should preserve an abstract data type, not least because it would allow us to implement a different structure in future.> > This would be simple enough to implement for both VDDK and file. (For > > VDDK the plugin needs to synthesize the holes but that's only a slight > > complication). > > > > NBDKIT_FLAG_REQ_ONE is now simple to implement (if generating the > > extents, stop after generating the first one; if it's the server > > processing the extents, take the first extent in the list). > > > > The ‘end’ that we discussed before is now implicit (it's the byte of > > the last extent in the list). > > So a client may give us more extents than we need, but can also easily > stop giving us extents as early as they want (and as long as they gave > us at least one, the overall call is successful). The requirement on > the plugin giving us extents in monotonically increasing order is not > too onerous; all your earlier implementations were naturally already > doing that (in other words, the full flexibility of random-access extent > probing rather than linear is over-engineered). > > I like the idea. > > I'm trying to figure out if filters would still need a foreach visitor. > But my initial guess is no. Consider - the earlier proposal required > the offset filter to do: > > + struct nbdkit_extents_map *map2; > + > + map2 = nbdkit_extents_new (); > + if (map2 == NULL) > + return -1; > + if (next_ops->extents (nxdata, count, offs + offset, > + flags, map2, err) == -1) { > + nbdkit_extents_free (map2); > + return -1; > + } > + > + /* Transform offsets in map2, return result in extents_map. */ > + if (nbdkit_extents_foreach (map2, subtract_offset, extents_map, > + NBDKIT_EXTENTS_FOREACH_FLAG_RANGE, > + offset, range) == -1) { > + nbdkit_extents_free (map2); > + return -1; > + } > + nbdkit_extents_free (map2); > > because the plugin was calling: > > + if (nbdkit_extent_add (extents_map, offset, n, type) == -1) > > But if we drop offset from the interface, the offset filter simplifies to: > > return next_ops->extents (nxdata, count, offs + offset, flags, list, err); > > because the plugin will be calling: > > if (nbdkit_extent_add(list, n type) == -1)For a simple case like this we might even allow filters to simply update the offsets in the list. It requires exposing the inner working of the data type to filters though.> > Also an observation: qemu's nbd client only ever issues block status > > requests with the req-one flag set, so perhaps we should optimize for > > that case. > > I hope to get to the point where future qemu doesn't send the req-one > flag. There's several threads on the qemu list about how qemu-img is > slower than it needs to be because it is throwing away useful > information, and where it is aggravated by the kernel's abyssmal lseek() > performance on tmpfs. But until qemu learns useful caching, you're > right that most existing NBD clients that request block status do so one > extent at a time (because I don't know of any other existing NBD clients > that use BLOCK_STATUS yet).Is it ever possible to cache block status results? What happens in the (admittedly unusual) case where two writers are hitting the same NBD server? For example if the server is implementing a cluster filesystem. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Maybe Matching Threads
- Re: New extents structure proposal
- [PATCH nbdkit 5/9] offset: Implement mapping of extents.
- New extents structure proposal
- [PATCH nbdkit 1/9] server: Implement extents/can_extents calls for plugins and filters.
- [PATCH nbdkit] server: Implement extents/can_extents calls for plugins and filters.