Richard W.M. Jones
2023-Feb-05 16:35 UTC
[Libguestfs] [PATCH nbdkit 0/6] curl: Use a curl handle pool
On Sat, Feb 04, 2023 at 12:34:52PM +0000, Richard W.M. Jones wrote:> Anyway, this all seems to work, but it actually reduces performance :-( > > In particular this simple test slows down quite substantially: > > time ./nbdkit -r -U - curl file:/var/tmp/fedora-36.img --run 'nbdcopy --no-extents -p "$uri" null:' > > (where /var/tmp/fedora-36.img is a 10G file).A bit more on this ... The slowdown is most easily observable if you apply this patch series, test it (see command above), and then change just: plugin/curl/curl.c: -#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL +#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS Serialising requests dramatically, repeatably improves the performance! Here are flame graphs for the two cases: http://oirase.annexia.org/tmp/nbdkit-parallel.svg http://oirase.annexia.org/tmp/nbdkit-serialize-requests.svg These are across all cores on a 12 core / 24 thread machine. nbdkit is somehow able to consume more total machine time in the serialize requests case (67.75%) than in the parallel case (37.75%). nbdcopy is taking about the same amount of time in both cases. In the parallel case, the time spent in do_idle in the kernel dramatically increases. My working theory is this is something to do with starvation of the NBD multi-conn connections: We now have multi-conn enabled, so nbdcopy will make 4 connections to nbdkit. nbdcopy also aggressively keeps multiple requests in flight on each connection (64 at a time). In the serialize_requests case, each NBD connection will only handle a single request at a time. These are shared across the 4 available libcurl handles. In the parallel requests case, it is highly likely that the first 4 requests on the 1st NBD connection will grab the 4 available libcurl handles. The replies will then be sent back over the single NBD connection. Then the next 4 requests from one of the NBD connections will repeat the same thing. Basically even though multi-conn is possible, I expect that only one NBD connection is being fully utilised most of the time (or anyway full use is not made of all 4 NBD connections at the same time). To maximize throughput we want to send replies over all NBD connections simultaneously, and serialize_requests (indirectly and accidentally) achieves that. I'm still adding instrumentation to see if the theory above is right, plus I have no idea how to fix this. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Richard W.M. Jones
2023-Feb-05 17:09 UTC
[Libguestfs] [PATCH nbdkit 0/6] curl: Use a curl handle pool
On Sun, Feb 05, 2023 at 04:35:41PM +0000, Richard W.M. Jones wrote:> I'm still adding instrumentation to see if the theory above is right, > plus I have no idea how to fix this.Turns out I didn't need to add instrumentation. Simply forcing nbdcopy to use at most 1 request per connection (-R 1) recovers all the performance. $ time ./nbdkit -r -U - curl file:/var/tmp/big --run 'nbdcopy --no-extents -R 1 -p "$uri" null:' I still have no good idea how to solve this. Somehow I had to adjust the libcurl handle pool so that it isn't first-come first-served, but prefers to spread available handles across connections. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html