Richard W.M. Jones
2020-Aug-05 12:28 UTC
Re: [Libguestfs] More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
Nir, BTW what are you using for performance testing? As far as I can tell it's not possible to make qemu-img convert use multi-conn when connecting to the source (which is going to be a problem if we want to use this stuff in virt-v2v). Instead I've hacked up a copy of this program from libnbd: https://github.com/libguestfs/libnbd/blob/master/examples/threaded-reads-and-writes.c so that it only does reads and aligns requests to 512 bytes. At least this is testing multi-conn, but there should be an easier way ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Nir Soffer
2020-Aug-05 12:39 UTC
Re: [Libguestfs] More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
On Wed, Aug 5, 2020 at 3:28 PM Richard W.M. Jones <rjones@redhat.com> wrote:> > > Nir, BTW what are you using for performance testing?virt-v2v with local image, or imageio client with local image.> As far as I can tell it's not possible to make qemu-img convert use > multi-conn when connecting to the source (which is going to be a > problem if we want to use this stuff in virt-v2v).But do we need multiple connections? qemu can send multiple requests on one connection. Did you try to copy an image from nbdkit file plugin to another nbdkit file plugin using qemu-img convert? nbdkit file pluing -> qemu-img convert -W nbd:///?socket=src.sock nbd:///?socket=dst.sock-> nbdkit file plugin I did not try it but I will be surprised if we don't get all 8 threads busy in both sides The reason we use multiple connection in imageio is that we don't support async I/O in http client, http server, and nbd client, and it is much easier to open new connection with the entire stuck compared to rewriting the http server and nbd client. It also much hard to provide easy to use interface for users supporting async I/O.> Instead I've hacked up a copy of this program from libnbd: > > https://github.com/libguestfs/libnbd/blob/master/examples/threaded-reads-and-writes.c > > so that it only does reads and aligns requests to 512 bytes. > > At least this is testing multi-conn, but there should be an easier way ... > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-top is 'top' for virtual machines. Tiny program with many > powerful monitoring features, net stats, disk stats, logging, etc. > http://people.redhat.com/~rjones/virt-top >
Richard W.M. Jones
2020-Aug-05 12:47 UTC
Re: [Libguestfs] More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
Here are some results anyway. The command I'm using is: $ ./nbdkit -r -U - vddk \ libdir=/path/to/vmware-vix-disklib-distrib \ user=root password='***' \ server='***' thumbprint=aa:bb:cc:... \ vm=moref=3 \ file='[datastore1] Fedora 28/Fedora 28.vmdk' \ --run 'time /var/tmp/threaded-reads $unixsocket' Source for threaded-reads is attached. (1) Existing nbdkit VDDK plugin. NR_MULTI_CONN = 1 NR_CYCLES = 10000 Note this is making 10,000 pread requests. real 1m26.103s user 0m0.283s sys 0m0.571s (2) VDDK plugin patched to support SERIALIZE_REQUESTS. NR_MULTI_CONN = 1 NR_CYCLES = 10000 Note this is making 10,000 pread requests. real 1m26.755s user 0m0.230s sys 0m0.539s (3) VDDK plugin same as in (2). NR_MULTI_CONN = 8 NR_CYCLES = 10000 Note this is making 80,000 pread requests in total. real 7m11.729s user 0m2.891s sys 0m6.037s My observations: Tests (1) and (2) are about the same within noise. Test (3) is making 8 times as many requests as test (1), so I think it's fair to compare the 8 x time taken by test (1) (ie. the time it would have taking to make 80,000 requests): Test (1) * 8 = 11m28 Test (3) = 7m11 So if we had a client which could actually use multi-conn then this would be a reasonable win. It seems like there's still a lot of locking going on somewhere, perhaps inside VDDK or in the server. It's certainly nowhere near a linear speedup. The patch does at least seem stable. I'll post it in a minute. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Richard W.M. Jones
2020-Aug-05 13:23 UTC
Re: [Libguestfs] More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
On Wed, Aug 05, 2020 at 03:39:15PM +0300, Nir Soffer wrote:> On Wed, Aug 5, 2020 at 3:28 PM Richard W.M. Jones <rjones@redhat.com> wrote: > > > > > > Nir, BTW what are you using for performance testing? > > virt-v2v with local image, or imageio client with local image. > > > As far as I can tell it's not possible to make qemu-img convert use > > multi-conn when connecting to the source (which is going to be a > > problem if we want to use this stuff in virt-v2v). > > But do we need multiple connections? qemu can send multiple requests > on one connection.As implemented now there is only one VDDK handle per connection and VDDK doesn't allow multiple requests at the same time on one handle, so with a single NBD connection everything will be serialized. Now if we were to implement a thread pool inside nbdkit-vddk-plugin we could get around that restriction (with a lot of complexity). But the results I posted a moment ago show that we wouldn't get anything like a linear speed up. It hardly seems worth it to me.> Did you try to copy an image from nbdkit file plugin to another nbdkit > file plugin using qemu-img convert? > > nbdkit file pluing -> qemu-img convert -W nbd:///?socket=src.sock > nbd:///?socket=dst.sock-> nbdkit file plugin > > I did not try it but I will be surprised if we don't get all 8 threads > busy in both sidesLikely, but that's because the file plugin uses PARALLEL as thread model, so even on a single NBD connection it can keep all threads inside nbdkit busy.> The reason we use multiple connection in imageio is that we don't > support async I/O in http client, http server, and nbd client, and > it is much easier to open new connection with the entire stuck > compared to rewriting the http server and nbd client. > > It also much hard to provide easy to use interface for users > supporting async I/O.Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
On 8/5/20 7:47 AM, Richard W.M. Jones wrote:> > Here are some results anyway. The command I'm using is: > > $ ./nbdkit -r -U - vddk \ > libdir=/path/to/vmware-vix-disklib-distrib \ > user=root password='***' \ > server='***' thumbprint=aa:bb:cc:... \ > vm=moref=3 \ > file='[datastore1] Fedora 28/Fedora 28.vmdk' \ > --run 'time /var/tmp/threaded-reads $unixsocket' > > Source for threaded-reads is attached. >> > Tests (1) and (2) are about the same within noise. > > Test (3) is making 8 times as many requests as test (1), so I think > it's fair to compare the 8 x time taken by test (1) (ie. the time it > would have taking to make 80,000 requests): > > Test (1) * 8 = 11m28 > Test (3) = 7m11 > > So if we had a client which could actually use multi-conn then this > would be a reasonable win. It seems like there's still a lot of > locking going on somewhere, perhaps inside VDDK or in the server. > It's certainly nowhere near a linear speedup.If I'm reading https://code.vmware.com/docs/11750/virtual-disk-development-kit-programming-guide/GUID-6BE903E8-DC70-46D9-98E4-E34A2002C2AD.html correctly, we cannot reuse a single VDDK handle for two parallel requests, but we CAN have two VDDK handles open, with requests open on both handles at the same time. That's what SERIALIZE_REQUESTS buys us - a client that opens multiple connections (taking advantage of multi-conn) now has two NBD handles and therefore two VDDK handles, and the reads separated across those two handles can proceed in parallel. But I also don't see anything that prevents a single NBD connection from opening multiple VDDK handles under the hood, or even from having all of those handles opened as coordinated through a single helper thread. That is, if nbdkit were to provide a way for a plugin to know the maximum number of threads that will be used in parallel, then vddk's .after_fork could spawn a dedicated thread for running VDDK handle open/close requests (protected by a mutex), the .open callback can then obtain the mutex, call into the helper thread to open N handles, the thread model is advertised as PARALLEL, and in all other calls (.pread, .pwrite, ...) we pick any one of the N handles for that NBD connection that is not currently in use. The client application would not even have to take advantage of multi-conn, but gets the full benefit of out-of-order thread access for a parallel speedup.> > The patch does at least seem stable. I'll post it in a minute.Whether we do all VixDiskLib_Open from a single dedicated helper thread created during .after_fork, or rely on pthread mutex locking so that at most one .open is calling Open or Close, is different from whether we open multiple VDDK handles per single NBD connection in PARALLEL mode, vs. one VDDK handle per NBD connection in SERIALIZE_REQUESTS (offloading the parallelism to the multi-conn client). We could probably do both, but while opening multiple VDDK handles per NBD connection appears to be fully compliant with VDDK docs (patch not written yet), the former (using a mutex to serialize open calls, but open is not happening in the same thread all the time) is indeed risky. And having the speedup available to all clients, not just multi-conn aware clients, seems like it might be nice to have, even if the speedup is not linear. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Nir Soffer
2020-Aug-05 14:40 UTC
Re: [Libguestfs] More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
On Wed, Aug 5, 2020 at 3:47 PM Richard W.M. Jones <rjones@redhat.com> wrote:> > > Here are some results anyway. The command I'm using is: > > $ ./nbdkit -r -U - vddk \ > libdir=/path/to/vmware-vix-disklib-distrib \ > user=root password='***' \ > server='***' thumbprint=aa:bb:cc:... \ > vm=moref=3 \ > file='[datastore1] Fedora 28/Fedora 28.vmdk' \ > --run 'time /var/tmp/threaded-reads $unixsocket' > > Source for threaded-reads is attached. > > (1) Existing nbdkit VDDK plugin. > > NR_MULTI_CONN = 1 > NR_CYCLES = 10000 > > Note this is making 10,000 pread requests. > > real 1m26.103s > user 0m0.283s > sys 0m0.571s > > (2) VDDK plugin patched to support SERIALIZE_REQUESTS. > > NR_MULTI_CONN = 1 > NR_CYCLES = 10000 > > Note this is making 10,000 pread requests. > > real 1m26.755s > user 0m0.230s > sys 0m0.539s > > (3) VDDK plugin same as in (2). > > NR_MULTI_CONN = 8 > NR_CYCLES = 10000 > > Note this is making 80,000 pread requests in total. > > real 7m11.729s > user 0m2.891s > sys 0m6.037s > > My observations: > > Tests (1) and (2) are about the same within noise. > > Test (3) is making 8 times as many requests as test (1), so I think > it's fair to compare the 8 x time taken by test (1) (ie. the time it > would have taking to make 80,000 requests): > > Test (1) * 8 = 11m28 > Test (3) = 7m11That's pretty good results, 62% faster. What is the request size used? I would test 1, 2, 4, 8 MiB reads.> So if we had a client which could actually use multi-conn then this > would be a reasonable win. It seems like there's still a lot of > locking going on somewhere, perhaps inside VDDK or in the server. > It's certainly nowhere near a linear speedup. > > The patch does at least seem stable. I'll post it in a minute. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > libguestfs lets you edit virtual machines. Supports shell scripting, > bindings from many languages. http://libguestfs.org
Maybe Matching Threads
- Re: More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
- Re: More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
- Re: More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
- Re: More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)
- Re: More parallelism in VDDK driver (was: Re: CFME-5.11.7.3 Perf. Tests)