Martin Kletzander
2019-May-20 13:23 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On Thu, May 16, 2019 at 08:05:18AM -0500, Eric Blake wrote:>On 5/15/19 5:39 PM, Martin Kletzander wrote: >> This filter caches the last result of the extents() call and offers a nice >> speed-up for clients that only support req_on=1 in combination with plugins like > >s/on/one/ > >> vddk, which has no overhead for returning information for multiple extents in >> one call, but that call is very time-consuming. >> >> Quick test showed that on a fast connection and a sparsely allocated 16G disk >> with a OS installed `qemu-img map` runs 16s instead of 33s (out of which it >> takes 5s to the first extents request). For 100G disk with no data on it, that >> is one hole extent spanning the whole disk (where there is no space for >> improvement) this does not add any noticeable overhead. >> >> Signed-off-by: Martin Kletzander <mkletzan@redhat.com> >> --- > >> +++ b/filters/cacheextents/cacheextents.c > > >> + >> +static int >> +cacheextents_extents (struct nbdkit_next_ops *next_ops, void *nxdata, >> + void *handle, uint32_t count, uint64_t offset, uint32_t flags, >> + struct nbdkit_extents *extents, >> + int *err) >> +{ >> + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); >> + >> + nbdkit_debug ("cacheextents:" >> + " cache_start=%" PRIu64 >> + " cache_end=%" PRIu64 >> + " cache_extents=%p", >> + cache_start, cache_end, cache_extents); >> + >> + if (cache_extents && >> + offset >= cache_start && offset < cache_end) { >> + nbdkit_debug ("cacheextents: returning from cache"); >> + return cacheextents_add (extents, err); >> + } >> + >> + nbdkit_debug ("cacheextents: cache miss"); >> + int r = next_ops->extents (nxdata, count, offset, flags, extents, err); > >This is a bit pessimistic. Observe: > request A (offset=0, count=1G) populates extents (0-.5G data, .5G-1.5G >hole) > request B (offset=.5G, count=1G) serviced from the cache >compared to: > request A (offset=0, count=1G) populates extents (0-.5G data, .5G-1.0G >hole) > request B (offset=.5G, count=1G) treated as cache miss > >It should be possible to note that request B overlaps with the cache, >and to therefore do a three-step action: fill 'extents' with the tail of >the cached extents (filling .5-1.0G), call next_ops->extents for the >remainder (1.0G-wherever), extend the cache to fill out the additional >information learned. >I'm not quite sure what you mean. Are you suggesting keeping the previous information and just updating it with new ones? If yes, then my response would be yes, sure, I can do that, but... 1) I would have to change the way the cached data are stored 2) Make sure that when updating the data nothing is rewritten 3) When returning data from cache check for continuity and when we're at that, we can also: 3) Make the responses to write-actions more granular instead of just throwing away the cache. Of course, none of the above is impossible or blocking, I just don't think it would provide much of an added value, since most of the time the requests would be sequential and if there is a quick response (or the client is not using req_one), this would not speed things up. So this is quite a niche filter for now.>> + if (r == -1) >> + return r; >> + >> + return cacheextents_fill (extents, err); >> +} >> + > >> +++ b/filters/cacheextents/nbdkit-cacheextents-filter.pod >> @@ -0,0 +1,47 @@ >> +=head1 NAME >> + >> +nbdkit-cacheextents-filter - cache extents > >> +=head1 SEE ALSO >> + >> +L<nbdkit(1)>, >> +L<nbdkit-cache-filter(1)>, >> +L<nbdkit-readahead-filter(1)>, >> +L<nbdkit-vddk-plugin(1)>, >> +L<nbdkit-filter(3)>, >> +L<qemu-img(1)>. >> + > >I'd also patch filters/nbdkit-cache-filter.pod to cross-link to the >cacheextents filter. And nbdkit-filter.pod probably needs an update to >point to all filters (hmm, I missed that in my recent proposed patch to >add a nocache filter; that filter could also do with a link to >cacheextents, as the two are orthogonal and can both be used at once) >Good point.>-- >Eric Blake, Principal Software Engineer >Red Hat, Inc. +1-919-301-3226 >Virtualization: qemu.org | libvirt.org >
Eric Blake
2019-May-20 14:08 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On 5/20/19 8:23 AM, Martin Kletzander wrote:>>> + if (cache_extents && >>> + offset >= cache_start && offset < cache_end) { >>> + nbdkit_debug ("cacheextents: returning from cache"); >>> + return cacheextents_add (extents, err); >>> + } >>> + >>> + nbdkit_debug ("cacheextents: cache miss"); >>> + int r = next_ops->extents (nxdata, count, offset, flags, extents, >>> err); >> >> This is a bit pessimistic. Observe: >> request A (offset=0, count=1G) populates extents (0-.5G data, .5G-1.5G >> hole) >> request B (offset=.5G, count=1G) serviced from the cache >> compared to: >> request A (offset=0, count=1G) populates extents (0-.5G data, .5G-1.0G >> hole) >> request B (offset=.5G, count=1G) treated as cache miss >> >> It should be possible to note that request B overlaps with the cache, >> and to therefore do a three-step action: fill 'extents' with the tail of >> the cached extents (filling .5-1.0G), call next_ops->extents for the >> remainder (1.0G-wherever), extend the cache to fill out the additional >> information learned. >> > > I'm not quite sure what you mean. Are you suggesting keeping the previous > information and just updating it with new ones? If yes, then my > response would > be yes, sure, I can do that, but...Okay, re-reading the code, you aren't checking whether offset+count lies within the cache, but are instead truncating your answer back to the client to the amount that was previously cached, and relying on the client's next request to likely start at the first byte that was not cached. So at most, you end up splitting what could have been a contiguous answer that crossed the cache_end boundary from one extent reply into two. Still, I wonder if clearing the REQ_ONE flag before calling into next_ops->extents(), or possibly even calling into next_ops->extents() with count widened out to a larger value, to learn as much information as possible on one pass, can result in fewer calls into next_ops.> > 1) I would have to change the way the cached data are storedNo, you'd still store the cache in the same way. It is more a matter of noting that what you have stored has a start and end boundary, and your incoming request partially overlaps what you already have stored, so use the cached values for the first half as well as limiting the call to the actual plugin to the second half. Something like: bool append = false; if (cache_extents && offset >= cache_start && offset < cache_end) { /* Partially or completely cached */ if (cacheextents_add (extents, err) == -1) return -1; if (offset + count <= cache_end) return 0; count -= cache_end - offset; offset = cache_end; append = true; } /* Miss or partially cached */ if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1) return -1; return cacheextents_fill (extents, append, err); and maybe even consider passing in a larger count to next_ops->extents, or even intentionally clearing the REQ_ONE flag before calling into next_ops, if our goal is to get as much information as feasible from next_ops for saving in our cache.> 2) Make sure that when updating the data nothing is rewritten > 3) When returning data from cache check for continuity > > and when we're at that, we can also: > > 3) Make the responses to write-actions more granular instead of just > throwing > away the cache.If the point of the cache is to speed up one-time read passes, we aren't writing. I think invalidating the entire cache on writes is just fine, where the focus is to make reads faster when there are no competing writes.> > Of course, none of the above is impossible or blocking, I just don't > think it > would provide much of an added value, since most of the time the > requests would > be sequential and if there is a quick response (or the client is not using > req_one), this would not speed things up. So this is quite a niche > filter for > now.So maybe it's not worth the effort of trying anything fancier; so much as a question to at least let us think about it. I'm fine if we go with your initial approach as the version we check in first. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Martin Kletzander
2019-May-21 10:17 UTC
Re: [Libguestfs] [nbdkit PATCH v2] Introduce cacheextents filter
On Mon, May 20, 2019 at 09:08:02AM -0500, Eric Blake wrote:>On 5/20/19 8:23 AM, Martin Kletzander wrote: > >>>> + if (cache_extents && >>>> + offset >= cache_start && offset < cache_end) { >>>> + nbdkit_debug ("cacheextents: returning from cache"); >>>> + return cacheextents_add (extents, err); >>>> + } >>>> + >>>> + nbdkit_debug ("cacheextents: cache miss"); >>>> + int r = next_ops->extents (nxdata, count, offset, flags, extents, >>>> err); >>> >>> This is a bit pessimistic. Observe: >>> request A (offset=0, count=1G) populates extents (0-.5G data, .5G-1.5G >>> hole) >>> request B (offset=.5G, count=1G) serviced from the cache >>> compared to: >>> request A (offset=0, count=1G) populates extents (0-.5G data, .5G-1.0G >>> hole) >>> request B (offset=.5G, count=1G) treated as cache miss >>> >>> It should be possible to note that request B overlaps with the cache, >>> and to therefore do a three-step action: fill 'extents' with the tail of >>> the cached extents (filling .5-1.0G), call next_ops->extents for the >>> remainder (1.0G-wherever), extend the cache to fill out the additional >>> information learned. >>> >> >> I'm not quite sure what you mean. Are you suggesting keeping the previous >> information and just updating it with new ones? If yes, then my >> response would >> be yes, sure, I can do that, but... > >Okay, re-reading the code, you aren't checking whether offset+count lies >within the cache, but are instead truncating your answer back to the >client to the amount that was previously cached, and relying on the >client's next request to likely start at the first byte that was not >cached. So at most, you end up splitting what could have been a >contiguous answer that crossed the cache_end boundary from one extent >reply into two. Still, I wonder if clearing the REQ_ONE flag before >calling into next_ops->extents(), or possibly even calling into >next_ops->extents() with count widened out to a larger value, to learn >as much information as possible on one pass, can result in fewer calls >into next_ops. > >> >> 1) I would have to change the way the cached data are stored > >No, you'd still store the cache in the same way. It is more a matter of >noting that what you have stored has a start and end boundary, and your >incoming request partially overlaps what you already have stored, so use >the cached values for the first half as well as limiting the call to the >actual plugin to the second half. Something like: > >bool append = false; >if (cache_extents && > offset >= cache_start && offset < cache_end) { > /* Partially or completely cached */ > if (cacheextents_add (extents, err) == -1) > return -1; > if (offset + count <= cache_end) > return 0; > count -= cache_end - offset; > offset = cache_end; > append = true; >} > >/* Miss or partially cached */ >if (next_ops->extents (nxdata, count, offset, flags, extents, err) == -1) > return -1; > >return cacheextents_fill (extents, append, err); > >and maybe even consider passing in a larger count to next_ops->extents, >or even intentionally clearing the REQ_ONE flag before calling into >next_ops, if our goal is to get as much information as feasible from >next_ops for saving in our cache. >Oh, now I understand. This makes sense if the client is not using req_one. IF it is, then it will interpret the first extent, request the next one and even though we have the whole extent in cache (probably), we do another request to the plugin because offset+count > cache_end, even though it will not et interpreted by the client. Passing larger count and clearing the req_one makes sense as the plugins themselves are supposed to also choose the most efficient way of returning the information (e.g. (not)honouring req_one, not having to fill @count bytes). I'll adjust that.>> 2) Make sure that when updating the data nothing is rewritten >> 3) When returning data from cache check for continuity >> >> and when we're at that, we can also: >> >> 3) Make the responses to write-actions more granular instead of just >> throwing >> away the cache. > >If the point of the cache is to speed up one-time read passes, we aren't >writing. I think invalidating the entire cache on writes is just fine, >where the focus is to make reads faster when there are no competing writes. > >> >> Of course, none of the above is impossible or blocking, I just don't >> think it >> would provide much of an added value, since most of the time the >> requests would >> be sequential and if there is a quick response (or the client is not using >> req_one), this would not speed things up. So this is quite a niche >> filter for >> now. > >So maybe it's not worth the effort of trying anything fancier; so much >as a question to at least let us think about it. I'm fine if we go with >your initial approach as the version we check in first. >Thanks. I'm glad we reached a mutual understanding. There will always be room for improvement, the question is if there is enough use cases for this filter.>-- >Eric Blake, Principal Software Engineer >Red Hat, Inc. +1-919-301-3226 >Virtualization: qemu.org | libvirt.org >