Richard W.M. Jones
2018-Nov-21 16:05 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
On Wed, Nov 21, 2018 at 09:59:51AM -0600, Eric Blake wrote:> On 11/21/18 9:46 AM, Richard W.M. Jones wrote: > >Matt asked if xz should really be a filter rather than a plugin. The > >answer is yes, of course it should be! That's been something in the > >todo file for a while. > > > >The commit converts the xz plugin code into a filter (leaving the > >plugin around, but deprecating it). > > > > plugin: nbdkit xz file.xz > > filter: nbdkit --filter=xz file file.xz > > > > plugin: # can't be done > > filter: nbdkit --filter=xz curl url=https://example.com/disk.xz > > And further: > > nbdkit --filter=cache --filter=xz curl url=... > > to take advantage of local caching rather than repeated curl requests.The xz plugin includes a block cache, and now so does the filter. (The plugin long predates our addition of filters into nbdkit). I suppose there is a case for removing the block cache code from the xz plugin, and relying on the cache filter instead. I'll test that out to see if it makes a difference. It will certainly simplify the xz filter code if we did that, but at the cost of making it a bit more complex to use.> >This is only lightly tested but it works for local files and for the > >curl example given in the commit message. Unfortunately because of > >the very large block size used in the Fedora cloud image, the curl > >example is barely usable. We should get them to use a more reasonable > >block size such as 16M (currently 192M). > > This may be the first real random-access of a remote xz file that > makes the argument for a smaller block size :) > > Of course, when you switch to a smaller block size, the xz image > can't compress quite as far, but hopefully the size difference is > not that bad. Do you have actual numbers comparing the file size, > vs. the speed changes made possible by the difference in block size?In fact yes I do. I measured < 1% overhead with a 16M block size (see nbdkit-xz-plugin man page for details). This is not particularly surprising since 16M is still pretty large. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Richard W.M. Jones
2018-Nov-21 16:17 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
On Wed, Nov 21, 2018 at 04:05:10PM +0000, Richard W.M. Jones wrote:> The xz plugin includes a block cache, and now so does the filter. > (The plugin long predates our addition of filters into nbdkit). > > I suppose there is a case for removing the block cache code from the > xz plugin, and relying on the cache filter instead. I'll test that > out to see if it makes a difference. It will certainly simplify the > xz filter code if we did that, but at the cost of making it a bit more > complex to use.Actually I think we are going to need to retain the block cache. It solves a slightly different problem from placing the cache filter on top (in fact both are useful). Let's say you have an XZ file with a 100,000 byte block size. Then reading two blocks at 0-1000 and 1000-2000 would result in reading and uncompressing a whole block twice. The block cache in the xz plugin/filter avoids this; the cache on top does not. Interesting factoid: www.mirrorsite.org rapidly throttles any connection that makes repeated range requests ... However if you open a new connection it is unaffected by the throttling on the existing connection (I thought it would throttle based on IP address). Anyway this, combined with the large block size in the Fedora Cloud image, makes xz + curl virtually unusable. I also think the new filter would be better if it made larger reads. The plugin makes 8K reads (BUFSIZ) which is likely reasonable for reading from a local file. But the overhead of reading from the curl plugin probably makes much larger reads sensible. I wonder if the filter can intuit a good block size to use somehow? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Eric Blake
2018-Nov-21 16:29 UTC
Re: [Libguestfs] [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
On 11/21/18 10:17 AM, Richard W.M. Jones wrote:> Actually I think we are going to need to retain the block cache. It > solves a slightly different problem from placing the cache filter on > top (in fact both are useful). > > Let's say you have an XZ file with a 100,000 byte block size. Then > reading two blocks at 0-1000 and 1000-2000 would result in reading and > uncompressing a whole block twice. The block cache in the xz > plugin/filter avoids this; the cache on top does not. > > Interesting factoid: www.mirrorsite.org rapidly throttles any > connection that makes repeated range requests ... However if you open > a new connection it is unaffected by the throttling on the existing > connection (I thought it would throttle based on IP address). Anyway > this, combined with the large block size in the Fedora Cloud image, > makes xz + curl virtually unusable. > > I also think the new filter would be better if it made larger reads. > The plugin makes 8K reads (BUFSIZ) which is likely reasonable for > reading from a local file. But the overhead of reading from the curl > plugin probably makes much larger reads sensible. I wonder if the > filter can intuit a good block size to use somehow?Yes, we need to revisit adding block sizing into nbdkit, as filters may easily optimize based on preferred blocksize of the lower layer, while possibly advertising a different blocksize up to the client. The existing nbdkit-blocksize-filter would then gain some smarts for being more useful for controlling sizes between layers (again, back to the question of whether we should improve nbdkit filters to allow multiple reuse of the same filter on a single plugin). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- Re: [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- Re: [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- [PATCH nbdkit 0/2] Rewrite xz plugin as a filter.
- [PATCH nbdkit v2 0/3] Rewrite xz plugin as a filter.
- [PATCH nbdkit] tests: xz: Use 16M block size when preparing disk for xz plugin test.