thr3ads.net - Libguestfs - Re: [Libguestfs] question on nbdkit shutdown behavior [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2017-Nov-14 15:48 UTC

Re: [Libguestfs] question on nbdkit shutdown behavior

On 11/14/2017 04:25 AM, Richard W.M. Jones wrote:> On Mon, Nov 13, 2017 at 12:42:48PM -0600, Eric Blake wrote:
>> I'm observing a difference in timing on nbdkit shutdown that is
>> dependent on client behavior, and I wonder if that's a bug, but I
can't
>> figure out where to patch it.
>>
>> When there is no connection present, sending SIGINT to nbdkit
terminates
>> immediately.
>>
>> If, on the other hand, there is a single client currently connected,
the
>> termination behavior on SIGINT depends on what the client has done: if
>> the client is currently silent with regards to issuing
>> transmission-phase transactions, nbdkit hangs for 5 seconds, then
>> forcibly tears down the connection:
> 
> I'm guessing it's because of commit
63f0eb0889c8f8a82ba06a02a8a92d695902baad
> which I added to fix a race in plugin_cleanup().  See also:
> https://www.redhat.com/archives/libguestfs/2017-September/msg00226.html
> 
> [...]
>> Why does current traffic from the client cause the plugin to be torn
>> down faster?  Does it matter?
> 
> I believe because the main loop checks the !quit flag if there
> is traffic on the connection.
> 
> There is most likely to be a better fix for the race than 63f0eb0889.
> I added that as a quick workaround for the segfault we saw in the
> tests.  Perhaps we should actively cancel the threads on shutdown
> instead of waiting?
If the plugin is still responding to an active request, and it is not
expecting thread cancellation, that could be bad (writing code that is
safe in the presence of pthread_cancel is not always trivial).  But the
scenario I described is when there are no active requests at the moment,
so the threads are under our control.  A better way might be to have a
read-from-self pipe that we write on SIGINT, and a select() that checks
for activity both on the read-from-self pipe and from the client, so
that we can react to either situation immediately.  But then we still
have to be careful to (try) and let the pending plugin active requests
complete (and merely reject any new requests from the client during that
time), before giving up completely.  In other words, I don't have a
problem with waiting 5 seconds for SIGINT to have an effect in some
cases, so much as a problem of explaining why we still have to wait even
when there is no reason for it.

At any rate, for now I'm just documenting the issues, rather than
planning on tackling the code associated with the problem; it will be a
task (for someone else?) down the road.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2017-Nov-14 15:56 UTC

head link

Re: [Libguestfs] question on nbdkit shutdown behavior

A deeper problem I think is that it's ill-specified what is
supposed to happen when nbdkit receives a termination signal.
We don't specify (for example) that plugin callbacks could or
could not be killed in the middle.  Killing them in the middle
of a callback could be bad for some plugins.

Going back to the original problem, perhaps we could have fixed
it instead by acquiring a lock around all plugin callbacks, and
ensuring that plugin_cleanup also acquires the same lock.  The
lock would normally be uncontended so it shouldn't cause any
performance issues.  That would be a fairly simple revert of
63f0eb088 + alternate fix.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Richard W.M. Jones

2017-Nov-14 16:02 UTC

head link

Re: [Libguestfs] question on nbdkit shutdown behavior

On Tue, Nov 14, 2017 at 03:56:59PM +0000, Richard W.M. Jones
wrote:> Going back to the original problem, perhaps we could have fixed
> it instead by acquiring a lock around all plugin callbacks, and
> ensuring that plugin_cleanup also acquires the same lock.  The
> lock would normally be uncontended so it shouldn't cause any
> performance issues.  That would be a fairly simple revert of
> 63f0eb088 + alternate fix.
No actually that would serialize everything.  Some sort of mutex
primitive, I think a shared/exclusive lock with the exclusive lock
being taken in plugin_cleanup.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Maybe Matching Threads

Search for more seemingly similar threads

Libguestfs - Nov 2017 - Re: question on nbdkit shutdown behavior

Re: [Libguestfs] question on nbdkit shutdown behavior

Re: [Libguestfs] question on nbdkit shutdown behavior

Re: [Libguestfs] question on nbdkit shutdown behavior

Maybe Matching Threads