Richard W.M. Jones
2016-Jan-19 16:18 UTC
[Libguestfs] [PATCH libguestfs v3] lib: Handle slow USB devices more gracefully.
Libvirt has a fixed 15 second timeout for qemu to exit. If qemu is writing to a slow USB key, it can hang (in D state) for much longer than this - many minutes usually. The solution is to check specifically for the libvirt EBUSY error when this happens, and retry the virDomainDestroyFlags operation (indefinitely). See also the description here: https://www.redhat.com/archives/libvir-list/2016-January/msg00767.html Similar to the following OpenStack Nova commit: http://git.openstack.org/cgit/openstack/nova/commit/?id=3907867 Thanks: Kashyap Chamarthy and Daniel Berrange. --- src/launch-libvirt.c | 45 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-) diff --git a/src/launch-libvirt.c b/src/launch-libvirt.c index 8215e02..90b6c49 100644 --- a/src/launch-libvirt.c +++ b/src/launch-libvirt.c @@ -25,6 +25,7 @@ #include <unistd.h> #include <fcntl.h> #include <grp.h> +#include <errno.h> #include <sys/types.h> #include <sys/stat.h> #include <assert.h> @@ -2015,6 +2016,8 @@ ignore_errors (void *ignore, virErrorPtr ignore2) /* empty */ } +static int destroy_domain (guestfs_h *g, virDomainPtr dom, int check_for_errors); + static int shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) { @@ -2023,23 +2026,14 @@ shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) virDomainPtr dom = data->dom; size_t i; int ret = 0; - int flags; /* Note that we can be called back very early in launch (specifically * from launch_libvirt itself), when conn and dom might be NULL. */ - if (dom != NULL) { - flags = check_for_errors ? VIR_DOMAIN_DESTROY_GRACEFUL : 0; - debug (g, "calling virDomainDestroy \"%s\" flags=%s", - data->name, check_for_errors ? "VIR_DOMAIN_DESTROY_GRACEFUL" : "0"); - if (virDomainDestroyFlags (dom, flags) == -1) { - libvirt_error (g, _("could not destroy libvirt domain")); - ret = -1; - } + ret = destroy_domain (g, dom, check_for_errors); virDomainFree (dom); } - if (conn != NULL) virConnectClose (conn); @@ -2068,6 +2062,37 @@ shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) return ret; } +/* Wrapper around virDomainDestroy which handles errors and retries.. */ +static int +destroy_domain (guestfs_h *g, virDomainPtr dom, int check_for_errors) +{ + const int flags = check_for_errors ? VIR_DOMAIN_DESTROY_GRACEFUL : 0; + virErrorPtr err; + + again: + debug (g, "calling virDomainDestroy flags=%s", + check_for_errors ? "VIR_DOMAIN_DESTROY_GRACEFUL" : "0"); + if (virDomainDestroyFlags (dom, flags) == -1) { + err = virGetLastError (); + + /* Second chance if we're just waiting for qemu to shut down. See: + * https://www.redhat.com/archives/libvir-list/2016-January/msg00767.html + */ + if ((flags & VIR_DOMAIN_DESTROY_GRACEFUL) && + err && err->code == VIR_ERR_SYSTEM_ERROR && err->int1 == EBUSY) + goto again; + + /* "Domain not found" is not treated as an error. */ + if (err && err->code == VIR_ERR_NO_DOMAIN) + return 0; + + libvirt_error (g, _("could not destroy libvirt domain")); + return -1; + } + + return 0; +} + /* Wrapper around error() which produces better errors for * libvirt functions. */ -- 2.5.0
Daniel P. Berrange
2016-Jan-19 17:00 UTC
Re: [Libguestfs] [PATCH libguestfs v3] lib: Handle slow USB devices more gracefully.
On Tue, Jan 19, 2016 at 04:18:46PM +0000, Richard W.M. Jones wrote:> Libvirt has a fixed 15 second timeout for qemu to exit. If qemu is > writing to a slow USB key, it can hang (in D state) for much longer > than this - many minutes usually. > > The solution is to check specifically for the libvirt EBUSY error when > this happens, and retry the virDomainDestroyFlags operation > (indefinitely). > > See also the description here: > https://www.redhat.com/archives/libvir-list/2016-January/msg00767.html > > Similar to the following OpenStack Nova commit: > http://git.openstack.org/cgit/openstack/nova/commit/?id=3907867 > > Thanks: Kashyap Chamarthy and Daniel Berrange. > --- > src/launch-libvirt.c | 45 +++++++++++++++++++++++++++++++++++---------- > 1 file changed, 35 insertions(+), 10 deletions(-) > > diff --git a/src/launch-libvirt.c b/src/launch-libvirt.c > index 8215e02..90b6c49 100644 > --- a/src/launch-libvirt.c > +++ b/src/launch-libvirt.c > @@ -25,6 +25,7 @@ > #include <unistd.h> > #include <fcntl.h> > #include <grp.h> > +#include <errno.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <assert.h> > @@ -2015,6 +2016,8 @@ ignore_errors (void *ignore, virErrorPtr ignore2) > /* empty */ > } > > +static int destroy_domain (guestfs_h *g, virDomainPtr dom, int check_for_errors); > + > static int > shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) > { > @@ -2023,23 +2026,14 @@ shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) > virDomainPtr dom = data->dom; > size_t i; > int ret = 0; > - int flags; > > /* Note that we can be called back very early in launch (specifically > * from launch_libvirt itself), when conn and dom might be NULL. > */ > - > if (dom != NULL) { > - flags = check_for_errors ? VIR_DOMAIN_DESTROY_GRACEFUL : 0; > - debug (g, "calling virDomainDestroy \"%s\" flags=%s", > - data->name, check_for_errors ? "VIR_DOMAIN_DESTROY_GRACEFUL" : "0"); > - if (virDomainDestroyFlags (dom, flags) == -1) { > - libvirt_error (g, _("could not destroy libvirt domain")); > - ret = -1; > - } > + ret = destroy_domain (g, dom, check_for_errors); > virDomainFree (dom); > } > - > if (conn != NULL) > virConnectClose (conn); > > @@ -2068,6 +2062,37 @@ shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) > return ret; > } > > +/* Wrapper around virDomainDestroy which handles errors and retries.. */ > +static int > +destroy_domain (guestfs_h *g, virDomainPtr dom, int check_for_errors) > +{ > + const int flags = check_for_errors ? VIR_DOMAIN_DESTROY_GRACEFUL : 0; > + virErrorPtr err; > + > + again: > + debug (g, "calling virDomainDestroy flags=%s", > + check_for_errors ? "VIR_DOMAIN_DESTROY_GRACEFUL" : "0"); > + if (virDomainDestroyFlags (dom, flags) == -1) { > + err = virGetLastError (); > + > + /* Second chance if we're just waiting for qemu to shut down. See: > + * https://www.redhat.com/archives/libvir-list/2016-January/msg00767.html > + */ > + if ((flags & VIR_DOMAIN_DESTROY_GRACEFUL) && > + err && err->code == VIR_ERR_SYSTEM_ERROR && err->int1 == EBUSY) > + goto again;NB, you could get that error even if you don't specify VIR_DOMAIN_DESTROY_GRACEFUL, because even SIGKILL can fail to kill QEMU if it is in an uninterruptable sleep on a dead NFS server for example and hard,nointr is used as mount options. So perhaps you don't want to check flags here ?> + > + /* "Domain not found" is not treated as an error. */ > + if (err && err->code == VIR_ERR_NO_DOMAIN) > + return 0; > + > + libvirt_error (g, _("could not destroy libvirt domain")); > + return -1; > + } > + > + return 0; > +}I don't know the use context from libguestfs POV, but is there any scenario in which a user would want a way to time out this, instead of making libguestfs wait forever ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Richard W.M. Jones
2016-Jan-19 17:57 UTC
Re: [Libguestfs] [PATCH libguestfs v3] lib: Handle slow USB devices more gracefully.
On Tue, Jan 19, 2016 at 05:00:27PM +0000, Daniel P. Berrange wrote:> On Tue, Jan 19, 2016 at 04:18:46PM +0000, Richard W.M. Jones wrote: > > Libvirt has a fixed 15 second timeout for qemu to exit. If qemu is > > writing to a slow USB key, it can hang (in D state) for much longer > > than this - many minutes usually. > > > > The solution is to check specifically for the libvirt EBUSY error when > > this happens, and retry the virDomainDestroyFlags operation > > (indefinitely). > > > > See also the description here: > > https://www.redhat.com/archives/libvir-list/2016-January/msg00767.html > > > > Similar to the following OpenStack Nova commit: > > http://git.openstack.org/cgit/openstack/nova/commit/?id=3907867 > > > > Thanks: Kashyap Chamarthy and Daniel Berrange. > > --- > > src/launch-libvirt.c | 45 +++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 35 insertions(+), 10 deletions(-) > > > > diff --git a/src/launch-libvirt.c b/src/launch-libvirt.c > > index 8215e02..90b6c49 100644 > > --- a/src/launch-libvirt.c > > +++ b/src/launch-libvirt.c > > @@ -25,6 +25,7 @@ > > #include <unistd.h> > > #include <fcntl.h> > > #include <grp.h> > > +#include <errno.h> > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <assert.h> > > @@ -2015,6 +2016,8 @@ ignore_errors (void *ignore, virErrorPtr ignore2) > > /* empty */ > > } > > > > +static int destroy_domain (guestfs_h *g, virDomainPtr dom, int check_for_errors); > > + > > static int > > shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) > > { > > @@ -2023,23 +2026,14 @@ shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) > > virDomainPtr dom = data->dom; > > size_t i; > > int ret = 0; > > - int flags; > > > > /* Note that we can be called back very early in launch (specifically > > * from launch_libvirt itself), when conn and dom might be NULL. > > */ > > - > > if (dom != NULL) { > > - flags = check_for_errors ? VIR_DOMAIN_DESTROY_GRACEFUL : 0; > > - debug (g, "calling virDomainDestroy \"%s\" flags=%s", > > - data->name, check_for_errors ? "VIR_DOMAIN_DESTROY_GRACEFUL" : "0"); > > - if (virDomainDestroyFlags (dom, flags) == -1) { > > - libvirt_error (g, _("could not destroy libvirt domain")); > > - ret = -1; > > - } > > + ret = destroy_domain (g, dom, check_for_errors); > > virDomainFree (dom); > > } > > - > > if (conn != NULL) > > virConnectClose (conn); > > > > @@ -2068,6 +2062,37 @@ shutdown_libvirt (guestfs_h *g, void *datav, int check_for_errors) > > return ret; > > } > > > > +/* Wrapper around virDomainDestroy which handles errors and retries.. */ > > +static int > > +destroy_domain (guestfs_h *g, virDomainPtr dom, int check_for_errors) > > +{ > > + const int flags = check_for_errors ? VIR_DOMAIN_DESTROY_GRACEFUL : 0; > > + virErrorPtr err; > > + > > + again: > > + debug (g, "calling virDomainDestroy flags=%s", > > + check_for_errors ? "VIR_DOMAIN_DESTROY_GRACEFUL" : "0"); > > + if (virDomainDestroyFlags (dom, flags) == -1) { > > + err = virGetLastError (); > > + > > + /* Second chance if we're just waiting for qemu to shut down. See: > > + * https://www.redhat.com/archives/libvir-list/2016-January/msg00767.html > > + */ > > + if ((flags & VIR_DOMAIN_DESTROY_GRACEFUL) && > > + err && err->code == VIR_ERR_SYSTEM_ERROR && err->int1 == EBUSY) > > + goto again; > > NB, you could get that error even if you don't specify > VIR_DOMAIN_DESTROY_GRACEFUL, because even SIGKILL can > fail to kill QEMU if it is in an uninterruptable sleep > on a dead NFS server for example and hard,nointr is used > as mount options. > > So perhaps you don't want to check flags here ?It's a bit confusing: As you say, testing `flags' is wrong, that part was carried over from an earlier patch. But I think I should test `check_for_errors' instead, since that flag basically means did we come from the guestfs_shutdown method -- where we want to be careful about data integrity -- or did we come from the guestfs_close method -- where the user just wants to close the connection without caring about data integrity.> > + > > + /* "Domain not found" is not treated as an error. */ > > + if (err && err->code == VIR_ERR_NO_DOMAIN) > > + return 0; > > + > > + libvirt_error (g, _("could not destroy libvirt domain")); > > + return -1; > > + } > > + > > + return 0; > > +} > > I don't know the use context from libguestfs POV, but is there any > scenario in which a user would want a way to time out this, instead > of making libguestfs wait forever ?There's no good answer, but I'll see if the need arises before making any further changes. Thanks for the review! Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Reasonably Related Threads
- [PATCH libguestfs v3] lib: Handle slow USB devices more gracefully.
- Re: [PATCH libguestfs v3] lib: Handle slow USB devices more gracefully.
- [PATCH] launch: libvirt: Implement drive secrets (RHBZ#1159016).
- VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call
- Re: VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call