thr3ads.net - Libguestfs - Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2017-Nov-21 21:21 UTC

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

This works OK on x86_64, but fails on our fast new Amberwing (aarch64)
machine.  I've attached the test-suite.log file, but I'm not very sure
what's going wrong from that.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat people.redhat.com/~rjones
Read my programming and virtualization blog: rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
people.redhat.com/~rjones/virt-df

Richard W.M. Jones

2017-Nov-21 21:32 UTC

head link

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

On Tue, Nov 21, 2017 at 09:21:59PM +0000, Richard W.M. Jones
wrote:> 
> This works OK on x86_64, but fails on our fast new Amberwing (aarch64)
> machine.  I've attached the test-suite.log file, but I'm not very
sure
> what's going wrong from that.
... although not every time.  It seems to be an intermittent race.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat people.redhat.com/~rjones
Read my programming and virtualization blog: rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  libguestfs.org

Eric Blake

2017-Nov-21 21:48 UTC

head link

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

On 11/21/2017 03:21 PM, Richard W.M. Jones wrote:> This works OK on x86_64, but fails on our fast new Amberwing (aarch64)
> machine.  I've attached the test-suite.log file, but I'm not very
sure
> what's going wrong from that.
> 
I'll see what I can spot...

> FAIL: test-parallel-file.sh
> ==========================> 
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0.0007 sec (1.340 KiB/sec and 1371.7421 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0.0203 sec (49.193231 bytes/sec and 49.1932 ops/sec)
> /var/tmp/nbdkit-1.1.18/src/nbdkit -v -t 1 -U -
> /var/tmp/nbdkit-1.1.18/plugins/file/.libs/nbdkit-file-plugin.so
file=file-data rdelay=2 wdelay=1 --run
>   $QEMU_IO -f raw -c "aio_read 0 1" -c "aio_write -P 2 1
1" -c aio_flush $nbd
> 
> nbdkit: file[1]: debug: handshake complete, processing requests serially
When run serially,
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0:00:02.00 (0.499933 bytes/sec and 0.4999 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0:00:03.01 (0.331378 bytes/sec and 0.3314 ops/sec)
the read takes 2 seconds, and the write takes an additional second.
> nbdkit: debug:
/var/tmp/nbdkit-1.1.18/plugins/file/.libs/nbdkit-file-plugin.so: unload
> /var/tmp/nbdkit-1.1.18/src/nbdkit -v -U -
> /var/tmp/nbdkit-1.1.18/plugins/file/.libs/nbdkit-file-plugin.so
file=file-data rdelay=2 wdelay=1 --run
>   $QEMU_IO -f raw -c "aio_read 0 1" -c "aio_write -P 2 1
1" -c aio_flush $nbd
> 
When run with threads,
> nbdkit: file[1]: debug: handshake complete, processing requests with 16
threads
> nbdkit: debug: starting worker thread file.0
> nbdkit: debug: starting worker thread file.1
> nbdkit: file.0: debug: acquire unload prevention lock
> nbdkit: file.0: debug: pread count=1 offset=0
> nbdkit: debug: starting worker thread file.2
> nbdkit: file.1: debug: acquire unload prevention lock
> nbdkit: file.1: debug: pwrite count=1 offset=1
there are definitely two threads that both obtain the unload prevention
lock at once (so we do have parallel processing),
> nbdkit: file.1: debug: flush
> nbdkit: file.0: debug: release unload prevention lock
> nbdkit: file.1: debug: release unload prevention lock
> nbdkit: file.2: debug: acquire unload prevention lock
> nbdkit: file.2: debug: flush
> nbdkit: file.2: debug: release unload prevention lock
> nbdkit: file.4: debug: acquire unload prevention lock
> nbdkit: file.4: debug: flush
> nbdkit: file.4: debug: release unload prevention lock
and then, because we have not wired up nbdkit to expose FUA support to
the plugins, we are calling the flush callback in parallel through three
threads at once (probably way more flushing than we need, but doesn't
seem to be the cause of the failure).  I don't know how many of those
flushes were triggered by nbdkit connections.c and how many were
triggered by NBD_CMD_FLUSH from qemu-io.
> nbdkit: file.0: debug: client closed input socket, closing connection
> nbdkit: file.0: debug: exiting worker thread file.0
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0:00:02.47 (0.403374 bytes/sec and 0.4034 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0:00:02.55 (0.391644 bytes/sec and 0.3916 ops/sec)
qemu-io finally records the replies it receives, and says that the read
took 2.4 seconds, then the write took an additional .08 seconds beyond
the read.  The two commands were obviously in flight in parallel
(because the wdelay is 1 second; as evidenced by the earlier results),
but there must be enough lock contention or other interaction between
the threads that we did not manage to get the write complete before the
read.  When I run the same test locally, and get a PASS result, my
qemu-io results look more like:

wrote 1/1 bytes at offset 1
1 bytes, 1 ops; 0:00:01.00 (0.994919 bytes/sec and 0.9949 ops/sec)
read 1/1 bytes at offset 0
1 bytes, 1 ops; 0:00:02.00 (0.499855 bytes/sec and 0.4999 ops/sec)

Hmm - I wonder - could it be extra delay induced by the flush that
happens to make the write delay longer than the read?  Should I swap the
commands (have read be 2 seconds, write be 1 second, so that any
flushing tied to the write is more likely to still finish before the read)?

Also, this makes it VERY obvious that without timestamps in our nbdkit
-v output, it's hard to tell WHEN things are happening, to see where the
bottlenecks are.
> FAIL: test-parallel-nbd.sh
> =========================> 
> nbdkit: file[1]: debug: release unload prevention lock
> nbdkit: file[1]: debug: handshake complete, processing requests with 16
threads
> nbdkit: nbd[1]: nbdkit: debug: debug: acquire unload prevention
lockstarting worker thread file.0
> 
> nbdkit: nbd[1]: nbdkit: debug: debug: get_sizestarting worker thread file.2
> 
> nbdkit: nbd[1]: debug: can_write
Here, we have a nice demonstration that commid d02d9c9d works for
messages from one process (my debugging was worse without the mutex in
errors.c), but no difference when interleaving messages from multiple
processes.  If we're that worried about it, we could make errors.c
create the entire message into a single buffer (prologue, body, and \n),
and only then have a single stdio call on that giant buffer, to make
cross-process interleaving less likely, at the expense of more memory
manipulation to build up the error message into a temporary buffer.  I'm
not too fussed, though - you aren't often trying to debug two nbdkit
processes writing to the same file at once.

> /var/tmp/nbdkit-1.1.18/src/nbdkit -v -U -
> /var/tmp/nbdkit-1.1.18/plugins/nbd/.libs/nbdkit-nbd-plugin.so
socket=test-parallel-nbd.sock --run
>   $QEMU_IO -f raw -c "aio_read 0 1" -c "aio_write -P 2 1
1" -c aio_flush $nbd
> 
And when we run the parallel nbd plugin over parallel file plugin, we see:
> nbdkit: nbd[1]: debug: handshake complete, processing requests with 16
threads
> nbdkit: debug: starting worker thread file.1
> nbdkit: debug: nbdkit: starting worker thread file.10debug: 
> starting worker thread nbd.0
> nbdkit: debug: starting worker thread nbd.2
> nbdkit: debug: starting worker thread nbd.3
> nbdkit: nbdkit: debug: debug: starting worker thread file.4starting worker
thread nbd.1
> 
> nbdkit: nbd.3: debug: acquire unload prevention lock
> nbdkit: nbd.3nbdkit: : debug: debug: starting worker thread file.12pread
count=1 offset=0
> 
> nbdkit: nbd.3: debug: sending request with type 0 and cookie 0xa00000009
nbd plugin sends the read request first,
> nbdkit: debug: nbdkit: starting worker thread file.6debug: 
> starting worker thread nbd.4
> nbdkit: nbdkit: file.0debug: : starting worker thread nbd.6debug: 
> acquire unload prevention lock
> nbdkit: file.0: debug: pread count=1 offset=0
which the file plugin starts processing;
> nbdkit: nbd.0: debug: acquire unload prevention lock
> nbdkit: nbdkit: debug: nbd.0starting worker thread file.15: 
> debug: pwrite count=1 offset=1
> nbdkit: debug: starting worker thread nbd.5
> nbdkit: debug: starting worker thread file.8nbdkit: 
> nbd.0: debug: sending request with type 1 and cookie 0xc0000000b
then in another thread nbd sends the write request,
> nbdkit: debug: starting worker thread file.14
> nbdkit: file.6: nbdkit: debug: debug: acquire unload prevention
lockstarting worker thread nbd.7
> 
> nbdkit: file.6: debug: pwrite count=1 offset=1
which the file plugin starts processing. So both processes are indeed
interleaved, as desired.
> nbdkit: debug: starting worker thread nbd.8
> nbdkit: debug: starting worker thread nbd.9
> nbdkit: debug: starting worker thread nbd.10
> nbdkit: debug: starting worker thread nbd.12
> nbdkit: debug: starting worker thread nbd.13
> nbdkit: debug: starting worker thread nbd.14
> nbdkit: debug: starting worker thread nbd.11
> nbdkit: debug: starting worker thread nbd.15
> nbdkit: file.6: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xc0000000b
And the write completes first, as desired.
> nbdkit: nbd.0: debug: flush
> nbdkit: nbd.0: debug: sending request with type 3 and cookie 0xc0000000b
> nbdkit: file.5: debug: acquire unload prevention lock
> nbdkit: file.5: debug: flush
> nbdkit: file.0: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xa00000009
> nbdkit: nbd.3: debug: release unload prevention lock
> nbdkit: file.5: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xc0000000b
> nbdkit: nbd.0: debug: release unload prevention lock
> nbdkit: nbd.6: debug: acquire unload prevention lock
> nbdkit: nbd.6: debug: flush
> nbdkit: nbd.6: debug: sending request with type 3 and cookie 0xa00000009
> nbdkit: file.7: debug: acquire unload prevention lock
> nbdkit: file.7: debug: flush
> nbdkit: file.7: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xa00000009
> nbdkit: nbd.6: debug: release unload prevention lock
> nbdkit: nbd.2: debug: acquire unload prevention lock
> nbdkit: nbd.2: debug: flush
> nbdkit: nbd.2: debug: sending request with type 3 and cookie 0xa00000009
> nbdkit: file.9: debug: acquire unload prevention lock
> nbdkit: file.9: debug: flush
> nbdkit: file.9: debug: release unload prevention lock
> nbdkit: debug: received reply for cookie 0xa00000009
but we have a lot of traffic on those flush requests
> nbdkit: nbd.2: debug: release unload prevention lock
> nbdkit: nbd.4: debug: client sent disconnect command, closing connection
> nbdkit: nbd.4: debug: exiting worker thread nbd.4
> nbdkit: nbd.1: debug: client closed input socket, closing connection
> read 1/1 bytes at offset 0
> 1 bytes, 1 ops; 0:00:02.47 (0.403408 bytes/sec and 0.4034 ops/sec)
> wrote 1/1 bytes at offset 1
> 1 bytes, 1 ops; 0:00:02.56 (0.390498 bytes/sec and 0.3905 ops/sec)
so that qemu-io sees the read take 2.4 seconds and the write took .09
seconds longer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Eric Blake

2017-Nov-21 22:02 UTC

head link

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

On 11/21/2017 03:48 PM, Eric Blake wrote:> On 11/21/2017 03:21 PM, Richard W.M. Jones wrote:
>> This works OK on x86_64, but fails on our fast new Amberwing (aarch64)
>> machine.  I've attached the test-suite.log file, but I'm not
very sure
>> what's going wrong from that.
>>
> 
> I'll see what I can spot...
> 
> 
> Hmm - I wonder - could it be extra delay induced by the flush that
> happens to make the write delay longer than the read?  Should I swap the
> commands (have read be 2 seconds, write be 1 second, so that any
> flushing tied to the write is more likely to still finish before the read)?
That should read:

If we write the test do use
 file ... rdelay=1 wdelay=2
coupled with
 qemu-io -c 'aio_write ...' -c 'aio_read ...'

instead of the current
 file ... rdelay=2 wdelay=1
 qemu-io -c 'aio_read ...' -c 'aio_write ...'

then the read (which probably does not have to flush) is more likely to
complete quickly than the write, but only if the read gets issued in
parallel.

Want me to submit a patch along those lines?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Seemingly Similar Threads

Search for more apparently analagous threads

Libguestfs - Nov 2017 - Re: [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

Re: [Libguestfs] [nbdkit PATCH v2 0/4] enable parallel nbd forwarding

Seemingly Similar Threads