I'm trying to replicate the features of the qemu curl plugin in nbdkit's curl plugin, in order that we can use nbdkit in virt-v2v to access VMware servers. I've implemented everything else so far [not posted yet] except for readahead. To my surprise actually, qemu's curl driver implements readahead itself. I thought it was a curl feature. I'm not completely clear _how_ it works in qemu, but it seems like it's maintaining an array of outstanding AIO requests, searching those first to see if one contains the read data, and extending the length of other read requests so there is a higher chance they will prefetch data: https://github.com/qemu/qemu/blob/230ce19814ecc6bff8edac3b5b86e7c82f422c6c/block/curl.c#L277 https://github.com/qemu/qemu/blob/230ce19814ecc6bff8edac3b5b86e7c82f422c6c/block/curl.c#L891 Oh and BTW in my testing we found that readahead was very important for performance of virt-v2v! This is because vCenter is really slow and we issue requests to vCenter serially, so avoiding long round trips is vital. (Of course vCenter performance sucks and anyone with any sense uses ‘virt-v2v -it ssh|vddk’, but vCenter unfortunately remains the only zero-config way to convert guests with a pure free software solution.) So how could we implement something like this in nbdkit? I don't particularly like the idea of extending the curl plugin so it's doing caching as well (essentially what qemu does IIUC). We already have a cache filter so I have two ideas: (1) Modify nbdkit-cache-filter to add a readahead parameter. This would work something like as follows: If the cache does not map any data following a pread request (up to the size of the readahead), then the pread request to the underlying plugin is extended, and the data added to the cache (but not returned upwards). Prefetched data is read from cache as usual. Unfortunately this could slow down all pread requests, since there's no easy way in a filter to return early with the requested data. Background threads? We've avoided this in filters so far. One problem is that they don't have access to the handle or "nextops" structure. (2) Add a new readahead filter which extents all pread requests (unconditionally) but throws away the prefetched data. You would need to use this in conjunction with a cache filter or other caching layer. As Eric points out there is no way to enforce correct layering here, we'd just have to trust the user. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Richard W.M. Jones
2019-Apr-01 16:55 UTC
Re: [Libguestfs] Readahead in the nbdkit curl plugin
On Mon, Apr 01, 2019 at 02:09:06PM +0100, Richard W.M. Jones wrote:> We already have a cache filter so I have two ideas: > > (1) Modify nbdkit-cache-filter to add a readahead parameter.There's a big problem that I didn't appreciate til now: The cache filter ends up splitting large reads rather badly. For example if the client is issuing 2M reads (not unreasonable for ‘qemu-img convert’) then the cache filter divides these into 4K requests to the plugin. Compare: $ iso='https://download.fedoraproject.org/pub/fedora/linux/releases/29/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-29-1.2.iso' $ ./nbdkit -U - -fv \ curl "$iso" \ --run 'qemu-img convert -f raw -p $nbd /var/tmp/out' nbdkit: curl[1]: debug: pread count=2097152 offset=0 nbdkit: curl[1]: debug: pread count=2097152 offset=2097152 nbdkit: curl[1]: debug: pread count=2097152 offset=4194304 nbdkit: curl[1]: debug: pread count=2097152 offset=6291456 $ ./nbdkit -U - -fv --filter=cache \ curl "$iso" \ --run 'qemu-img convert -f raw -p $nbd /var/tmp/out' nbdkit: curl[1]: debug: cache: pread count=2097152 offset=0 flags=0x0 nbdkit: curl[1]: debug: cache: blk_read block 0 (offset 0) is not cached nbdkit: curl[1]: debug: pread count=4096 offset=0 nbdkit: curl[1]: debug: cache: blk_read block 1 (offset 4096) is not cached nbdkit: curl[1]: debug: pread count=4096 offset=4096 nbdkit: curl[1]: debug: cache: blk_read block 2 (offset 8192) is not cached nbdkit: curl[1]: debug: pread count=4096 offset=8192 nbdkit: curl[1]: debug: cache: blk_read block 3 (offset 12288) is not cached nbdkit: curl[1]: debug: pread count=4096 offset=12288 nbdkit: curl[1]: debug: cache: blk_read block 4 (offset 16384) is not cached nbdkit: curl[1]: debug: pread count=4096 offset=16384 (FWIW we want reads of 64M or larger to get decent performance with virt-v2v). Unfortunately the cache filter kills performance dead because of round-trip times to the web server. This is a problem with the cache filter that we could likely solve with a bit of effort, but let's go back and take a look at option number 2 again:> (2) Add a new readahead filter which extends all pread requestsWhen I'm doing v2v / qemu-img convert I don't really need the cache filter, except it was a convenient place to save the prefetched data. A dumber readahead filter might help here. Suppose it simply stores the position of the last read and prefetches (and saves) a certain amount of data following that read. If the next read is sequential, and so matches the position pointer, return the saved data, otherwise throw it away and do a normal read. I believe that this would solve the readahead problem in this case (but I didn't test it out yet). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Apparently Analagous Threads
- Re: [PATCH nbdkit] v2v: Disable readahead for VMware curl sources too (RHBZ#1848862).
- [PATCH nbdkit] v2v: Disable readahead for VMware curl sources too (RHBZ#1848862).
- [PATCH nbdkit] Add readahead filter.
- Re: [PATCH v4 07/12] v2v: nbdkit: Add the readahead filter unconditionally if it is available.
- [PATCH nbdkit v2] Add readahead filter.