--------------------------------------------------------------------------- The following patch when applied on top of: libxl: attempt to cleanup tapdisk processes on disk backend destroy. Establishes correct cleanup behavior for blktap devices. This patch implements the release of the backend device before calling for the destruction of the userspace component of the blktap device. Without this patch the kernel xen-blkback driver deadlocks with the blktap2 user control plane until the IPC channel is terminated by the timeout on the select() call. This results in a noticeable delay in the termination of the guest and causes the blktap minor number which had been allocated to be orphaned. Signed-off-by: Greg Wettstein <greg@enjellic.com> diff -r b184698e0893 tools/libxl/libxl_blktap2.c --- a/tools/libxl/libxl_blktap2.c Tue Nov 06 19:53:48 2012 -0600 +++ b/tools/libxl/libxl_blktap2.c Tue Nov 06 19:54:23 2012 -0600 @@ -59,6 +59,7 @@ void libxl__device_destroy_tapdisk(libxl char *path, *params, *type, *disk; int err; tap_list_t tap; + libxl_ctx *ctx = libxl__gc_owner(gc); path = libxl__sprintf(gc, "%s/tapdisk-params", be_path); if (!path) return; @@ -75,5 +76,11 @@ void libxl__device_destroy_tapdisk(libxl err = tap_ctl_find(type, disk, &tap); if (err < 0) return; + /* + * Remove the instance of the backend device to avoid a deadlock with the + * removal of the tap device. + */ + xs_rm(ctx->xsh, XBT_NULL, be_path); + tap_ctl_destroy(tap.id, tap.minor); } diff -r b184698e0893 tools/libxl/libxl_device.c --- a/tools/libxl/libxl_device.c Tue Nov 06 19:53:48 2012 -0600 +++ b/tools/libxl/libxl_device.c Tue Nov 06 19:54:23 2012 -0600 @@ -250,8 +250,7 @@ int libxl__device_destroy(libxl_ctx *ctx if (!state) goto out; if (atoi(state) != 4) { - libxl__device_destroy_tapdisk(&gc, be_path); - xs_rm(ctx->xsh, XBT_NULL, be_path); + libxl__device_destroy_tapdisk(&gc, be_path); goto out; } --------------------------------------------------------------------------- As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Fungus doesn''t take a vacation." -- Rob Pike
On Wed, 2012-11-07 at 02:07 +0000, Dr. Greg Wettstein wrote:> --------------------------------------------------------------------------- > The following patch when applied on top of: > > libxl: attempt to cleanup tapdisk processes on disk backend destroy. > > Establishes correct cleanup behavior for blktap devices. This patch > implements the release of the backend device before calling for > the destruction of the userspace component of the blktap device. > > Without this patch the kernel xen-blkback driver deadlocks with > the blktap2 user control plane until the IPC channel is terminated by the > timeout on the select() call. This results in a noticeable delay > in the termination of the guest and causes the blktap minor > number which had been allocated to be orphaned. > > Signed-off-by: Greg Wettstein <greg@enjellic.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>> diff -r b184698e0893 tools/libxl/libxl_blktap2.c > --- a/tools/libxl/libxl_blktap2.c Tue Nov 06 19:53:48 2012 -0600 > +++ b/tools/libxl/libxl_blktap2.c Tue Nov 06 19:54:23 2012 -0600 > @@ -59,6 +59,7 @@ void libxl__device_destroy_tapdisk(libxl > char *path, *params, *type, *disk; > int err; > tap_list_t tap; > + libxl_ctx *ctx = libxl__gc_owner(gc); > > path = libxl__sprintf(gc, "%s/tapdisk-params", be_path); > if (!path) return; > @@ -75,5 +76,11 @@ void libxl__device_destroy_tapdisk(libxl > err = tap_ctl_find(type, disk, &tap); > if (err < 0) return; > > + /* > + * Remove the instance of the backend device to avoid a deadlock with the > + * removal of the tap device. > + */ > + xs_rm(ctx->xsh, XBT_NULL, be_path); > + > tap_ctl_destroy(tap.id, tap.minor); > } > diff -r b184698e0893 tools/libxl/libxl_device.c > --- a/tools/libxl/libxl_device.c Tue Nov 06 19:53:48 2012 -0600 > +++ b/tools/libxl/libxl_device.c Tue Nov 06 19:54:23 2012 -0600 > @@ -250,8 +250,7 @@ int libxl__device_destroy(libxl_ctx *ctx > if (!state) > goto out; > if (atoi(state) != 4) { > - libxl__device_destroy_tapdisk(&gc, be_path); > - xs_rm(ctx->xsh, XBT_NULL, be_path); > + libxl__device_destroy_tapdisk(&gc, be_path); > goto out; > } > > --------------------------------------------------------------------------- > > As always, > Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. > 4206 N. 19th Ave. Specializing in information infra-structure > Fargo, ND 58102 development. > PH: 701-281-1686 > FAX: 701-281-3949 EMAIL: greg@enjellic.com > ------------------------------------------------------------------------------ > "Fungus doesn''t take a vacation." > -- Rob Pike > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Dr. Greg Wettstein writes ("[PATCH 2/2] 4.1.2 blktap2 cleanup fixes."):> --------------------------------------------------------------------------- > The following patch when applied on top of: > > libxl: attempt to cleanup tapdisk processes on disk backend destroy. > > Establishes correct cleanup behavior for blktap devices. This patch > implements the release of the backend device before calling for > the destruction of the userspace component of the blktap device. > > Without this patch the kernel xen-blkback driver deadlocks with > the blktap2 user control plane until the IPC channel is terminated by the > timeout on the select() call. This results in a noticeable delay > in the termination of the guest and causes the blktap minor > number which had been allocated to be orphaned.This looks plausible. But shouldn''t it be applied to xen-unstable and Xen 4.2 too ?> if (atoi(state) != 4) { > - libxl__device_destroy_tapdisk(&gc, be_path); > - xs_rm(ctx->xsh, XBT_NULL, be_path); > + libxl__device_destroy_tapdisk(&gc, be_path); > goto out;And this contains a whitespace change. Ian.
On Wed, 2012-12-12 at 17:46 +0000, Ian Jackson wrote:> Dr. Greg Wettstein writes ("[PATCH 2/2] 4.1.2 blktap2 cleanup fixes."): > > --------------------------------------------------------------------------- > > The following patch when applied on top of: > > > > libxl: attempt to cleanup tapdisk processes on disk backend destroy. > > > > Establishes correct cleanup behavior for blktap devices. This patch > > implements the release of the backend device before calling for > > the destruction of the userspace component of the blktap device. > > > > Without this patch the kernel xen-blkback driver deadlocks with > > the blktap2 user control plane until the IPC channel is terminated by the > > timeout on the select() call. This results in a noticeable delay > > in the termination of the guest and causes the blktap minor > > number which had been allocated to be orphaned. > > This looks plausible. But shouldn''t it be applied to xen-unstable and > Xen 4.2 too ?Xen unstable and 4.2 changed in other ways which included fixing this issue as a side effect. I can''t remember which change it was but I think it was either your asyncing up of the interfaces or the block script stuff which I did. You can probably find out which by looking at previous postings of this series. So this is a fix for the code as it was in 4.1.x Ian.
Ian Campbell writes ("Re: [Xen-devel] [PATCH 2/2] 4.1.2 blktap2 cleanup fixes."):> On Wed, 2012-11-07 at 02:07 +0000, Dr. Greg Wettstein wrote: > > --------------------------------------------------------------------------- > > The following patch when applied on top of: > > > > libxl: attempt to cleanup tapdisk processes on disk backend destroy. > > > > Establishes correct cleanup behavior for blktap devices. This patch > > implements the release of the backend device before calling for > > the destruction of the userspace component of the blktap device. > > > > Without this patch the kernel xen-blkback driver deadlocks with > > the blktap2 user control plane until the IPC channel is terminated by the > > timeout on the select() call. This results in a noticeable delay > > in the termination of the guest and causes the blktap minor > > number which had been allocated to be orphaned. > > > > Signed-off-by: Greg Wettstein <greg@enjellic.com> > > Acked-by: Ian Campbell <ian.campbell@citrix.com>I have fixed up the whitespace error and committed this to 4.1. Ian.
Reasonably Related Threads
- [PATCH 1/2] 4.1.2 blktap2 cleanup fixes.
- [PATCH] libxl: attempt to cleanup tapdisk processes on disk backend destroy
- iSCSI connection corrupts Xen block devices.
- Livelock induced failure in blktap2.
- Re: Dom0 freeze on HVM DomU Windows reboot with VGA passthrough