thr3ads.net - Libguestfs - [Libguestfs] question on nbdkit shutdown behavior [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2017-Nov-13 18:42 UTC

[Libguestfs] question on nbdkit shutdown behavior

I'm observing a difference in timing on nbdkit shutdown that is
dependent on client behavior, and I wonder if that's a bug, but I can't
figure out where to patch it.

When there is no connection present, sending SIGINT to nbdkit terminates
immediately.

If, on the other hand, there is a single client currently connected, the
termination behavior on SIGINT depends on what the client has done: if
the client is currently silent with regards to issuing
transmission-phase transactions, nbdkit hangs for 5 seconds, then
forcibly tears down the connection:

term1$ ./nbdkit -f -v -n file file=TODO
...
nbdkit: debug: bound to IP address <any>:10809 (2 socket(s))

term2$ qemu-io -f raw nbd://localhost:10809/ -r
qemu-io>

term1$ ^C
nbdkit: debug: waiting for running threads to complete
nbdkit: debug: waited 5s for running threads to complete
nbdkit: debug:
/home/eblake/nbdkit/plugins/file/.libs/nbdkit-file-plugin.so: unload

term2$
qemu-io> r 0 1
read failed: Input/output error
qemu-io> q


But, if the client makes ANY transaction during that window, then nbdkit
lets that transaction succeed, then tears down the connection without
any further wait:

term1$ ./nbdkit -f -v -n file file=TODO
...
nbdkit: debug: bound to IP address <any>:10809 (2 socket(s))

term2$ qemu-io -f raw nbd://localhost:10809/ -r
qemu-io> r 0 1      # wait to hit Enter until right after ^C in term1

term1$ ^C
nbdkit: debug: waiting for running threads to complete

term2$ # hit enter

term1$
nbdkit: file[1]: debug: acquire per-connection request lock
nbdkit: file[1]: debug: pread count=1 offset=0
nbdkit: file[1]: debug: release per-connection request lock
nbdkit: file[1]: debug: close
nbdkit: debug: waited 2s for running threads to complete
nbdkit: debug:
/home/eblake/nbdkit/plugins/file/.libs/nbdkit-file-plugin.so: unload

term2$
qemu-io> r 0 1
read 1/1 bytes at offset 0
1 bytes, 1 ops; 0.0002 sec (4.138 KiB/sec and 4237.2881 ops/sec)

Why does current traffic from the client cause the plugin to be torn
down faster?  Does it matter?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2017-Nov-14 10:25 UTC

head link

Re: [Libguestfs] question on nbdkit shutdown behavior

On Mon, Nov 13, 2017 at 12:42:48PM -0600, Eric Blake
wrote:> I'm observing a difference in timing on nbdkit shutdown that is
> dependent on client behavior, and I wonder if that's a bug, but I
can't
> figure out where to patch it.
> 
> When there is no connection present, sending SIGINT to nbdkit terminates
> immediately.
> 
> If, on the other hand, there is a single client currently connected, the
> termination behavior on SIGINT depends on what the client has done: if
> the client is currently silent with regards to issuing
> transmission-phase transactions, nbdkit hangs for 5 seconds, then
> forcibly tears down the connection:
I'm guessing it's because of commit
63f0eb0889c8f8a82ba06a02a8a92d695902baad
which I added to fix a race in plugin_cleanup().  See also:
https://www.redhat.com/archives/libguestfs/2017-September/msg00226.html

[...]> Why does current traffic from the client cause the plugin to be torn
> down faster?  Does it matter?
I believe because the main loop checks the !quit flag if there
is traffic on the connection.

There is most likely to be a better fix for the race than 63f0eb0889.
I added that as a quick workaround for the segfault we saw in the
tests.  Perhaps we should actively cancel the threads on shutdown
instead of waiting?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Eric Blake

2017-Nov-14 15:48 UTC

head link

Re: [Libguestfs] question on nbdkit shutdown behavior

On 11/14/2017 04:25 AM, Richard W.M. Jones wrote:> On Mon, Nov 13, 2017 at 12:42:48PM -0600, Eric Blake wrote:
>> I'm observing a difference in timing on nbdkit shutdown that is
>> dependent on client behavior, and I wonder if that's a bug, but I
can't
>> figure out where to patch it.
>>
>> When there is no connection present, sending SIGINT to nbdkit
terminates
>> immediately.
>>
>> If, on the other hand, there is a single client currently connected,
the
>> termination behavior on SIGINT depends on what the client has done: if
>> the client is currently silent with regards to issuing
>> transmission-phase transactions, nbdkit hangs for 5 seconds, then
>> forcibly tears down the connection:
> 
> I'm guessing it's because of commit
63f0eb0889c8f8a82ba06a02a8a92d695902baad
> which I added to fix a race in plugin_cleanup().  See also:
> https://www.redhat.com/archives/libguestfs/2017-September/msg00226.html
> 
> [...]
>> Why does current traffic from the client cause the plugin to be torn
>> down faster?  Does it matter?
> 
> I believe because the main loop checks the !quit flag if there
> is traffic on the connection.
> 
> There is most likely to be a better fix for the race than 63f0eb0889.
> I added that as a quick workaround for the segfault we saw in the
> tests.  Perhaps we should actively cancel the threads on shutdown
> instead of waiting?
If the plugin is still responding to an active request, and it is not
expecting thread cancellation, that could be bad (writing code that is
safe in the presence of pthread_cancel is not always trivial).  But the
scenario I described is when there are no active requests at the moment,
so the threads are under our control.  A better way might be to have a
read-from-self pipe that we write on SIGINT, and a select() that checks
for activity both on the read-from-self pipe and from the client, so
that we can react to either situation immediately.  But then we still
have to be careful to (try) and let the pending plugin active requests
complete (and merely reject any new requests from the client during that
time), before giving up completely.  In other words, I don't have a
problem with waiting 5 seconds for SIGINT to have an effect in some
cases, so much as a problem of explaining why we still have to wait even
when there is no reason for it.

At any rate, for now I'm just documenting the issues, rather than
planning on tackling the code associated with the problem; it will be a
task (for someone else?) down the road.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Reasonably Related Threads

Search for more possibly parallel threads

Libguestfs - Nov 2017 - question on nbdkit shutdown behavior

[Libguestfs] question on nbdkit shutdown behavior

Re: [Libguestfs] question on nbdkit shutdown behavior

Re: [Libguestfs] question on nbdkit shutdown behavior

Reasonably Related Threads