Richard W.M. Jones
2021-Jul-27 11:16 UTC
[Libguestfs] Some questions about nbdkit vs qemu performance affecting virt-v2v
Hi Eric, a couple of questions below about nbdkit performance. Modular virt-v2v will use disk pipelines everywhere. The input pipeline looks something like this: socket <- cow filter <- cache filter <- nbdkit curl|vddk We found there's a notable slow down in at least one case: When the source plugin is very slow (eg. it's curl plugin to a slow and remote website, or VDDK in general), everything runs very slowly. I made a simple test case to demonstrate this: $ virt-builder fedora-33 $ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img delay-read=500ms --run 'virt-inspector --format=raw -a "$uri" -vx' This uses a local file with the delay filter on top injecting half second delays into every read. It "feels" a lot like the slow case we were observing. Virt-v2v also does inspection as a first step when converting an image, so using virt-inspector is somewhat realistic. Unfortunately this actually runs far too slowly for me to wait around - at least 30 mins, and probably a lot longer. This compares to only 7 seconds if you remove the delay filter. Reducing the delay to 50ms means at least it finishes in a reasonable time: $ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \ delay-read=50ms \ --run 'virt-inspector --format=raw -a "$uri"' real 5m16.298s user 0m0.509s sys 0m2.894s In the above scenario the cache filter is not actually doing anything (since virt-inspector does not write). Adding cache-on-read=true lets us cache the reads, avoiding going through the "slow" plugin in many cases, and the result is a lot better: $ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \ delay-read=50ms cache-on-read=true \ --run 'virt-inspector --format=raw -a "$uri"' real 0m27.731s user 0m0.304s sys 0m1.771s However this is still slower than the old method which used qcow2 + qemu's copy-on-read. It's harder to demonstrate this, but I modified virt-inspector to use the copy-on-read setting (which it doesn't do normally). On top of nbdkit with 50ms delay and no other filters: qemu + copy-on-read backed by nbdkit delay-read=50ms file: real 0m23.251s So 23s is the time to beat. (I believe that with longer delays, the gap between qemu and nbdkit increases in favour of qemu.) Q1: What other ideas could we explore to improve performance? - - - In real scenarios we'll actually want to combine cow + cache, where cow is caching writes, and cache is caching reads. socket <- cow filter <- cache filter <- nbdkit cache-on-read=true curl|vddk The cow filter is necessary to prevent changes being written back to the pristine source image. This is actually surprisingly efficient, making no noticable difference in this test: time ./nbdkit --filter=cow --filter=cache --filter=delay \ file /var/tmp/fedora-33.img \ delay-read=50ms cache-on-read=true \ --run 'virt-inspector --format=raw -a "$uri"' real 0m27.193s user 0m0.283s sys 0m1.776s Q2: Should we consider a "cow-on-read" flag to the cow filter (thus removing the need to use the cache filter at all)? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Martin Kletzander
2021-Jul-27 12:18 UTC
[Libguestfs] Some questions about nbdkit vs qemu performance affecting virt-v2v
On Tue, Jul 27, 2021 at 12:16:59PM +0100, Richard W.M. Jones wrote:>Hi Eric, a couple of questions below about nbdkit performance. > >Modular virt-v2v will use disk pipelines everywhere. The input >pipeline looks something like this: > > socket <- cow filter <- cache filter <- nbdkit > curl|vddk > >We found there's a notable slow down in at least one case: When the >source plugin is very slow (eg. it's curl plugin to a slow and remote >website, or VDDK in general), everything runs very slowly. > >I made a simple test case to demonstrate this: > >$ virt-builder fedora-33 >$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img delay-read=500ms --run 'virt-inspector --format=raw -a "$uri" -vx' > >This uses a local file with the delay filter on top injecting half >second delays into every read. It "feels" a lot like the slow case we >were observing. Virt-v2v also does inspection as a first step when >converting an image, so using virt-inspector is somewhat realistic. > >Unfortunately this actually runs far too slowly for me to wait around >- at least 30 mins, and probably a lot longer. This compares to only >7 seconds if you remove the delay filter. > >Reducing the delay to 50ms means at least it finishes in a reasonable time: > >$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \ > delay-read=50ms \ > --run 'virt-inspector --format=raw -a "$uri"' > >real 5m16.298s >user 0m0.509s >sys 0m2.894s > >In the above scenario the cache filter is not actually doing anything >(since virt-inspector does not write). Adding cache-on-read=true lets >us cache the reads, avoiding going through the "slow" plugin in many >cases, and the result is a lot better: > >$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \ > delay-read=50ms cache-on-read=true \ > --run 'virt-inspector --format=raw -a "$uri"' > >real 0m27.731s >user 0m0.304s >sys 0m1.771s > >However this is still slower than the old method which used qcow2 + >qemu's copy-on-read. It's harder to demonstrate this, but I modified >virt-inspector to use the copy-on-read setting (which it doesn't do >normally). On top of nbdkit with 50ms delay and no other filters: > >qemu + copy-on-read backed by nbdkit delay-read=50ms file: >real 0m23.251s > >So 23s is the time to beat. (I believe that with longer delays, the >gap between qemu and nbdkit increases in favour of qemu.) > >Q1: What other ideas could we explore to improve performance? >First thing that came to mind: Could it be that QEMU's cache-on-read caches maybe bigger blocks making it effectively do some small read-ahead as well?>- - - > >In real scenarios we'll actually want to combine cow + cache, where >cow is caching writes, and cache is caching reads. > > socket <- cow filter <- cache filter <- nbdkit > cache-on-read=true curl|vddk > >The cow filter is necessary to prevent changes being written back to >the pristine source image. > >This is actually surprisingly efficient, making no noticable >difference in this test: > >time ./nbdkit --filter=cow --filter=cache --filter=delay \ > file /var/tmp/fedora-33.img \ > delay-read=50ms cache-on-read=true \ > --run 'virt-inspector --format=raw -a "$uri"' > >real 0m27.193s >user 0m0.283s >sys 0m1.776s > >Q2: Should we consider a "cow-on-read" flag to the cow filter (thus >removing the need to use the cache filter at all)? >That would make at least some sense since there is cow-on-cache already (albeit a little confusing for me personally). I presume it would not increase the size of the difference (when using qemu-img rebase) at all, right? I do not see however how it would be faster than the existing: cow <- cache[cache-on-read] Martin> >Rich. > >-- >Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones >Read my programming and virtualization blog: http://rwmj.wordpress.com >virt-df lists disk usage of guests without needing to install any >software inside the virtual machine. Supports Linux and Windows. >http://people.redhat.com/~rjones/virt-df/ > >_______________________________________________ >Libguestfs mailing list >Libguestfs at redhat.com >https://listman.redhat.com/mailman/listinfo/libguestfs >-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <http://listman.redhat.com/archives/libguestfs/attachments/20210727/e36a2144/attachment.sig>
Eric Blake
2021-Jul-29 01:50 UTC
[Libguestfs] Some questions about nbdkit vs qemu performance affecting virt-v2v
On Tue, Jul 27, 2021 at 12:16:59PM +0100, Richard W.M. Jones wrote:> Hi Eric, a couple of questions below about nbdkit performance. > > Modular virt-v2v will use disk pipelines everywhere. The input > pipeline looks something like this: > > socket <- cow filter <- cache filter <- nbdkit > curl|vddk > > We found there's a notable slow down in at least one case: When the > source plugin is very slow (eg. it's curl plugin to a slow and remote > website, or VDDK in general), everything runs very slowly. > > I made a simple test case to demonstrate this: > > $ virt-builder fedora-33 > $ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img delay-read=500ms --run 'virt-inspector --format=raw -a "$uri" -vx' > > This uses a local file with the delay filter on top injecting half > second delays into every read. It "feels" a lot like the slow case we > were observing. Virt-v2v also does inspection as a first step when > converting an image, so using virt-inspector is somewhat realistic. > > Unfortunately this actually runs far too slowly for me to wait around > - at least 30 mins, and probably a lot longer. This compares to only > 7 seconds if you remove the delay filter. > > Reducing the delay to 50ms means at least it finishes in a reasonable time: > > $ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \ > delay-read=50ms \ > --run 'virt-inspector --format=raw -a "$uri"' > > real 5m16.298s > user 0m0.509s > sys 0m2.894sSounds like the reads are rather serialized (the application is not proceeding to do a second read until after it has the result of the first read) rather than highly parallel (where the application would be reading multiple sites in the image at once, possibly by requesting the start of a read at two different offsets before knowing which of those two offsets is even useful). There's also a question of how frequently a given portion of the disk image is re-read (caching will speed things up if data is revisited multiple times, but just adds overhead if the reads are truly once-only access for the life of the process).> > In the above scenario the cache filter is not actually doing anything > (since virt-inspector does not write). Adding cache-on-read=true lets > us cache the reads, avoiding going through the "slow" plugin in many > cases, and the result is a lot better: > > $ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \ > delay-read=50ms cache-on-read=true \ > --run 'virt-inspector --format=raw -a "$uri"' > > real 0m27.731s > user 0m0.304s > sys 0m1.771sOkay, that sounds like there is indeed frequent re-reading of portions of the disk (or at least reading of nearby smaller offsets that fall within the same larger granularity used by the cache).> > However this is still slower than the old method which used qcow2 + > qemu's copy-on-read. It's harder to demonstrate this, but I modified > virt-inspector to use the copy-on-read setting (which it doesn't do > normally). On top of nbdkit with 50ms delay and no other filters: > > qemu + copy-on-read backed by nbdkit delay-read=50ms file: > real 0m23.251sqemu's copy-on-read creates a qcow2 image backed by a read-only base image; any read that the qcow2 can't satisfy causes the entire cluster to be read from the backing image into the qcow2 file, even if that cluster is larger than what the client was actually reading. It will benefit from the same speedups of only hitting a given region of the backing file once in the life of the process. But it also assumes the presence of a backing chain. If you try to use copy-on-read on something that does not have a backing chain (such as a direct use of an NBD link), the performance suffers (as we discussed on IRC). My understanding is that for every read operation, the COR code does a block status query to see whether the data was local or came from the backing chain; but in the case of an NBD image which does not have a backing chain from qemu's point of view, EVERY block status operation comes back as being local, and the COR has nothing further to do - so the performance penalty is because of the extra time spent on that block status call, particularly if that results in another round trip NBD command over the wire before any reading happens.> > So 23s is the time to beat. (I believe that with longer delays, the > gap between qemu and nbdkit increases in favour of qemu.) > > Q1: What other ideas could we explore to improve performance?Have you played with block sizing? (Reading the git log, you have...) Part of qemu's COR behavior is that for any read not found in the qcow2 active layer, the entire cluster is copied up the backing chain; a 512-byte client read becomes a 32k cluster read for the default sizing. Other block sizes may be more efficient, such as 64k or 1M per request actually sent over the wire.> > - - - > > In real scenarios we'll actually want to combine cow + cache, where > cow is caching writes, and cache is caching reads. > > socket <- cow filter <- cache filter <- nbdkit > cache-on-read=true curl|vddk > > The cow filter is necessary to prevent changes being written back to > the pristine source image. > > This is actually surprisingly efficient, making no noticable > difference in this test: > > time ./nbdkit --filter=cow --filter=cache --filter=delay \ > file /var/tmp/fedora-33.img \ > delay-read=50ms cache-on-read=true \ > --run 'virt-inspector --format=raw -a "$uri"' > > real 0m27.193s > user 0m0.283s > sys 0m1.776s > > Q2: Should we consider a "cow-on-read" flag to the cow filter (thus > removing the need to use the cache filter at all)?Since cow is already a form of caching (anything we touched now lives locally, so we don't have to re-visit the original data source), yes, it makes sense to have a cow-on-read mode that stores even reads locally. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org