Hi, hope the day is going well for everyone. I had posted a note about this issue two weeks ago and didn''t get any response. I don''t know if that indicates that people are not using blktap or if the question stumped everyone. First of all there is a documentable issue with blktap in 4.1.2 and xl. Anyone trying to use it to export files to a guest will be affected. Using xl to do ''block-attach'' and ''block-detach'' in dom0 also doesn''t work in stock 4.1.2 so I suspect there are generic issues with blktap and xl. In stock 4.1.2 using xl results in the tapdisk2 server process being orphaned. There is a patch floating around from Ian Campbell which fixes this problem and either creates or uncovers what may be a more fundamental problem. With Ian''s patch applied the tapdisk2 process terminates but the tapdisk device is not released resulting in a steady accumulation of orphan minor numbers. The underlying cause of this appears to be a resource deadlock between xl requesting a detach of the VBD and the tapdisk2 process. Ian actually acknowledges the phenomenon in one of his posts saying there appeared to be a ''hang'' when the VBD is released with his patches. The ''hang'' is actually a livelock where the entire kernel blocks until the select call from xl to the tapdisk2 process times out and rescues things. The error return from the select call is what causes the xl code to not complete the release of the tapdisk minor. I''m still hunting through the code maze trying to find the underlying problem. The tapdisk2 process is hanging on the munmap of the ring buffers. It ''feels'' as if the problem may be secondary to xl still holding a reference to some resource which blocks the unmap of the ring buffers. I will continue to hunt but if anyone has any pointers send them my way. It takes a bit of time to come up to speed on all the code paths. In a broader context I think the XEN community needs to take a reasoned look at how to handle the file issue. It appears as if Dan Stodden has disappeared so I get the sense the entire blktap2 architecture is a bit rudderless (no judgement just observation). I''m trying to put my money where my mouth is and actually hunt for the problems which are there. I''ve seen rumbles on LKML about discussions with regards to modifications to loop in order to deal with the page cache issues which hinder the reliability of delivering virtual block devices over loop. I don''t know how far out that is or even if it will eventuate. I''ve also heard rumbles about a mythical ''blktap3'' which runs completely in userspace. If that is the direction things are going I would certainly be willing to hammer on that rather then put more time into blktap2 if there is ''blktap3'' code someplace. I will look forward to any comments or suggestions people may have. Have a good weekend. As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "My thoughts on trusting Open-Source? A quote I once saw said it best: ''Remember, Amateurs built the ark. Professionals built the Titanic.'' Perhaps most significantly the ark was one guy, there were no doubt committees involved with the Titanic project." -- Dr. G.W. Wettstein Resurrection
On Thu, 29 Mar 2012, greg@enjellic.com wrote:> Hi, hope the day is going well for everyone. > > I had posted a note about this issue two weeks ago and didn''t get any > response. I don''t know if that indicates that people are not using > blktap or if the question stumped everyone. > > First of all there is a documentable issue with blktap in 4.1.2 and > xl. Anyone trying to use it to export files to a guest will be > affected. Using xl to do ''block-attach'' and ''block-detach'' in dom0 > also doesn''t work in stock 4.1.2 so I suspect there are generic issues > with blktap and xl. > > In stock 4.1.2 using xl results in the tapdisk2 server process being > orphaned. There is a patch floating around from Ian Campbell which > fixes this problem and either creates or uncovers what may be a more > fundamental problem. > > With Ian''s patch applied the tapdisk2 process terminates but the > tapdisk device is not released resulting in a steady accumulation of > orphan minor numbers. The underlying cause of this appears to be a > resource deadlock between xl requesting a detach of the VBD and the > tapdisk2 process. > > Ian actually acknowledges the phenomenon in one of his posts saying > there appeared to be a ''hang'' when the VBD is released with his > patches. The ''hang'' is actually a livelock where the entire kernel > blocks until the select call from xl to the tapdisk2 process times out > and rescues things. The error return from the select call is what > causes the xl code to not complete the release of the tapdisk minor. > > I''m still hunting through the code maze trying to find the underlying > problem. The tapdisk2 process is hanging on the munmap of the ring > buffers. It ''feels'' as if the problem may be secondary to xl still > holding a reference to some resource which blocks the unmap of the > ring buffers. > > I will continue to hunt but if anyone has any pointers send them my > way. It takes a bit of time to come up to speed on all the code > paths. > > In a broader context I think the XEN community needs to take a > reasoned look at how to handle the file issue. It appears as if Dan > Stodden has disappeared so I get the sense the entire blktap2 > architecture is a bit rudderless (no judgement just observation). I''m > trying to put my money where my mouth is and actually hunt for the > problems which are there. > > I''ve seen rumbles on LKML about discussions with regards to > modifications to loop in order to deal with the page cache issues > which hinder the reliability of delivering virtual block devices over > loop. I don''t know how far out that is or even if it will eventuate. > I''ve also heard rumbles about a mythical ''blktap3'' which runs > completely in userspace. If that is the direction things are going I > would certainly be willing to hammer on that rather then put more time > into blktap2 if there is ''blktap3'' code someplace. > > I will look forward to any comments or suggestions people may have.A completely userspace blktap version is indeed in the work, but it is not clear yet when it is going to be ready. Alternatively if you don''t need VHD format support, you can use QCOW or QCOW2 with upstream QEMU in xen-unstable, that leads to very good performances and still provides copy on write support.
On Thu, 2012-03-29 at 18:35 +0100, Stefano Stabellini wrote:> On Thu, 29 Mar 2012, greg@enjellic.com wrote: > > Hi, hope the day is going well for everyone. > > > > I had posted a note about this issue two weeks ago and didn''t get any > > response. I don''t know if that indicates that people are not using > > blktap or if the question stumped everyone. > > > > First of all there is a documentable issue with blktap in 4.1.2 and > > xl. Anyone trying to use it to export files to a guest will be > > affected. Using xl to do ''block-attach'' and ''block-detach'' in dom0 > > also doesn''t work in stock 4.1.2 so I suspect there are generic issues > > with blktap and xl. > > > > In stock 4.1.2 using xl results in the tapdisk2 server process being > > orphaned. There is a patch floating around from Ian Campbell which > > fixes this problem and either creates or uncovers what may be a more > > fundamental problem. > > > > With Ian''s patch applied the tapdisk2 process terminates but the > > tapdisk device is not released resulting in a steady accumulation of > > orphan minor numbers. The underlying cause of this appears to be a > > resource deadlock between xl requesting a detach of the VBD and the > > tapdisk2 process. > > > > Ian actually acknowledges the phenomenon in one of his posts saying > > there appeared to be a ''hang'' when the VBD is released with his > > patches. The ''hang'' is actually a livelock where the entire kernel > > blocks until the select call from xl to the tapdisk2 process times out > > and rescues things. The error return from the select call is what > > causes the xl code to not complete the release of the tapdisk minor. > > > > I''m still hunting through the code maze trying to find the underlying > > problem. The tapdisk2 process is hanging on the munmap of the ring > > buffers. It ''feels'' as if the problem may be secondary to xl still > > holding a reference to some resource which blocks the unmap of the > > ring buffers. > > > > I will continue to hunt but if anyone has any pointers send them my > > way. It takes a bit of time to come up to speed on all the code > > paths. > > > > In a broader context I think the XEN community needs to take a > > reasoned look at how to handle the file issue. It appears as if Dan > > Stodden has disappeared so I get the sense the entire blktap2 > > architecture is a bit rudderless (no judgement just observation). I''m > > trying to put my money where my mouth is and actually hunt for the > > problems which are there. > > > > I''ve seen rumbles on LKML about discussions with regards to > > modifications to loop in order to deal with the page cache issues > > which hinder the reliability of delivering virtual block devices over > > loop. I don''t know how far out that is or even if it will eventuate. > > I''ve also heard rumbles about a mythical ''blktap3'' which runs > > completely in userspace. If that is the direction things are going I > > would certainly be willing to hammer on that rather then put more time > > into blktap2 if there is ''blktap3'' code someplace. > > > > I will look forward to any comments or suggestions people may have. > > A completely userspace blktap version is indeed in the work, but it is > not clear yet when it is going to be ready. > > Alternatively if you don''t need VHD format support, you can use QCOW or > QCOW2 with upstream QEMU in xen-unstable, that leads to very good > performances and still provides copy on write support. >We should still support blktap2 in the 4.2 for people who have a kernel with the necessary driver included therefore I still think the bug which Greg reports needs investigating and hopefully fixing. Ian.
On Thu, 2012-03-29 at 15:39 +0100, greg@enjellic.com wrote:> Hi, hope the day is going well for everyone. > > I had posted a note about this issue two weeks ago and didn''t get any > response. I don''t know if that indicates that people are not using > blktap or if the question stumped everyone. > > First of all there is a documentable issue with blktap in 4.1.2 and > xl. Anyone trying to use it to export files to a guest will be > affected.The issue you refer to here is the deadlock describe below right? The block device is actually functional while the guest is up and running?> Using xl to do ''block-attach'' and ''block-detach'' in dom0 > also doesn''t work in stock 4.1.2 so I suspect there are generic issues > with blktap and xl.Stefano was looking at block-attach to dom0 more generally in xen-unstable too.> In stock 4.1.2 using xl results in the tapdisk2 server process being > orphaned. There is a patch floating around from Ian Campbell which > fixes this problem and either creates or uncovers what may be a more > fundamental problem.That patch has been applied to xen-unstable -- does this hang occur for you there? (I''m in the process of installing a suitable system to try and reproduce).> With Ian''s patch applied the tapdisk2 process terminates but the > tapdisk device is not released resulting in a steady accumulation of > orphan minor numbers. The underlying cause of this appears to be a > resource deadlock between xl requesting a detach of the VBD and the > tapdisk2 process.When you say "the tapdisk2 process terminates" I guess you mean "tries to terminate but gets blocked" rather than actually quits, since otherwise I don''t think there would be a deadlock? I think an approach worth trying would be to have tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before doing the actual detach. i.e. it would respond with "Yes, I will do that" rather than "Yes, I have done that". My speculation is that this will allow libxl to continue and hopefully avoid the deadlock. A similar alternative would be to have tap_ctl_detach not wait for a response at all. In both cases I think it could be argued that if the TAPDISK_MESSAGE_DETACH message does not result in tapdisk2 shutting down there isn''t much that libxl could do about it anyway so it might as well get on with its life. The other approach would be to figure out what libxl is doing after its call (or perhaps just not doing) to tap_ctl_detach which actually has to be done first e..g perhaps destroying the xenstore backend directory?> I''ve also heard rumbles about a mythical ''blktap3'' which runs > completely in userspace. If that is the direction things are going I > would certainly be willing to hammer on that rather then put more time > into blktap2 if there is ''blktap3'' code someplace.I expect that a lot of the control-plane stuff will be the same between blktap2 and 3 so any issues fixed in 2 will likely help 3 too. It is also worth fixing issues in blktap2 simply for the sake of fixing them since there will still be people who are using it in 4.2. Ian.
On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:> I think an approach worth trying would be to have > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before > doing the actual detach. i.e. it would respond with "Yes, I will do > that" rather than "Yes, I have done that". My speculation is that this > will allow libxl to continue and hopefully avoid the deadlock.This seems to be the case as the following fixes things for me. Thanks very much for your analysis which lead me to this solution... Ian. 8<-------------------------------- # HG changeset patch # User Ian Campbell <ian.campbell@citrix.com> # Date 1333106445 -3600 # Node ID 79e3dbe4b659e78408a9eea76c51a601bd4a383a # Parent 947e095a6d29b5e90a01b7c06b298d4c91dcf3ac tapdisk: respond to destroy request before tearing down the commuication channel This avoids a deadlock while sending the response which leads to a 5s timeout and leaked tapdisk processes. Instead send the response after getting the VBD which appears to be the only step wich can fail (or at least the only one which returns an error code). Also call libxl__device_destroy_tapdisk before removing the backend directory in xenstore since we need the tapdisk-params value. Reported-by: Greg Wettstein <greg@enjellic.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> diff -r 947e095a6d29 -r 79e3dbe4b659 tools/blktap2/drivers/tapdisk-control.c --- a/tools/blktap2/drivers/tapdisk-control.c Tue Mar 27 14:09:11 2012 +0100 +++ b/tools/blktap2/drivers/tapdisk-control.c Fri Mar 30 12:20:45 2012 +0100 @@ -379,10 +379,18 @@ tapdisk_control_detach_vbd(struct tapdis int err; vbd = tapdisk_server_get_vbd(request->cookie); - if (!vbd) { - err = -EINVAL; - goto out; - } + err = !vbd ? -EINVAL : 0; + + memset(&response, 0, sizeof(response)); + response.type = TAPDISK_MESSAGE_DETACH_RSP; + response.cookie = request->cookie; + response.u.response.error = -err; + + tapdisk_control_write_message(connection->socket, &response, 2); + tapdisk_control_close_connection(connection); + + if (err) + return; tapdisk_vbd_detach(vbd); @@ -390,16 +398,6 @@ tapdisk_control_detach_vbd(struct tapdis tapdisk_server_remove_vbd(vbd); free(vbd); } - - err = 0; -out: - memset(&response, 0, sizeof(response)); - response.type = TAPDISK_MESSAGE_DETACH_RSP; - response.cookie = request->cookie; - response.u.response.error = -err; - - tapdisk_control_write_message(connection->socket, &response, 2); - tapdisk_control_close_connection(connection); } static void diff -r 947e095a6d29 -r 79e3dbe4b659 tools/libxl/libxl_device.c --- a/tools/libxl/libxl_device.c Tue Mar 27 14:09:11 2012 +0100 +++ b/tools/libxl/libxl_device.c Fri Mar 30 12:20:45 2012 +0100 @@ -432,11 +432,11 @@ int libxl__device_destroy(libxl__gc *gc, char *be_path = libxl__device_backend_path(gc, dev); char *fe_path = libxl__device_frontend_path(gc, dev); + libxl__device_destroy_tapdisk(gc, be_path); + xs_rm(ctx->xsh, XBT_NULL, be_path); xs_rm(ctx->xsh, XBT_NULL, fe_path); - libxl__device_destroy_tapdisk(gc, be_path); - return 0; }
Ian Campbell writes ("Re: [Xen-devel] Basic blktap2 functionality issues."):> On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote: > > I think an approach worth trying would be to have > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before > > doing the actual detach. i.e. it would respond with "Yes, I will do > > that" rather than "Yes, I have done that". My speculation is that this > > will allow libxl to continue and hopefully avoid the deadlock. > > This seems to be the case as the following fixes things for me. Thanks > very much for your analysis which lead me to this solution...Greg, can you confirm whether this works for you ? Ian.
On Mar 30, 12:23pm, Ian Campbell wrote: } Subject: Re: [Xen-devel] Basic blktap2 functionality issues. Hi, hope the weekend is going well for everyone. Sorry for the delay in getting back to everyone on this, had a deadline on another project.> On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote: > > I think an approach worth trying would be to have > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before > > doing the actual detach. i.e. it would respond with "Yes, I will do > > that" rather than "Yes, I have done that". My speculation is that this > > will allow libxl to continue and hopefully avoid the deadlock.> This seems to be the case as the following fixes things for > me. Thanks very much for your analysis which lead me to this > solution...I ported your fix into 4.1.2 but I think we still have a problem, at least in this codebase. I no longer see the select timeout delay when xl shuts down but upon shutdown the minor number is not freed. A ''tap-ctl list'' shows a steadily increasing set of orphaned minor numbers as VM''s are started up and shutdown. Are you seeing this in your development codebase? The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in tap_ctl_free(). The underlying reason for the ioctl failure is the check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy() for whether or not the task_struct pointer in the blktap_ring structure has been NULLed. Which certainly makes sense since there is a race between xl''s call to tap_ctl_free() and tapdisk2 getting to the point where it can close its descriptor to the blktap instance and thus invoke the .release method which translates into a call to blktap_ring_release() which NULL''s the task_struct pointer. If you are not seeing the orphan minor numbers there must be ordering changes in the unstable version of xl which eliminate or alters the race timing. For the sake of completeness of information for this thread I captured the following stack trace of a tapdisk2 when it is deadlocked against xl: --------------------------------------------------------------------------- Apr 1 07:03:45 hooter kernel: Call Trace: Apr 1 07:03:45 hooter kernel: [<c10aa791>] ? blkdev_get_blocks+0xb4/0xb4 Apr 1 07:03:45 hooter kernel: [<c109ab11>] ? iput+0x28/0x143 Apr 1 07:03:45 hooter kernel: [<c10e9fec>] ? blk_peek_request+0x155/0x165 Apr 1 07:03:45 hooter kernel: [<c128279c>] schedule+0x4d/0x4f Apr 1 07:03:45 hooter kernel: [<f7a9053b>] blktap_device_destroy_sync+0x63/0x76 [blktap] Apr 1 07:03:45 hooter kernel: [<c104204a>] ? wake_up_bit+0x61/0x61 Apr 1 07:03:45 hooter kernel: [<f7a8f57a>] blktap_ring_release+0xe/0x29 [blktap] Apr 1 07:03:45 hooter kernel: [<c108aba7>] fput+0xce/0x167 Apr 1 07:03:45 hooter kernel: [<c107944a>] remove_vma+0x28/0x47 Apr 1 07:03:45 hooter kernel: [<c107a19a>] do_munmap+0x1e8/0x204 Apr 1 07:03:45 hooter kernel: [<c107a1de>] sys_munmap+0x28/0x37 Apr 1 07:03:45 hooter kernel: [<c1284adc>] sysenter_do_call+0x12/0x2c Apr 1 07:03:45 hooter kernel: [<c1280000>] ? migration_call+0x1d9/0x1f2 Apr 1 07:03:45 hooter kernel: c686dad0 00000286 c1045bcd e86581c0 e84f6380 c13d1c80 c13d1c80 d753476c Apr 1 07:03:45 hooter kernel: e9185b00 e9185c78 00000000 d75346cb 00002ad2 d75346cb e8515ae4 e8515ae4 Apr 1 07:03:45 hooter kernel: c1005597 00000000 ed7d80c8 c686dae4 c686da98 c1005d10 ed7d02c0 00000000 --------------------------------------------------------------------------- Let me know if you are seeing the issues I''m seeing, in the meantime I will keep hunting to see if I can rundown the ultimate cause of the deadlock. Given the above trace it has to be an issue with xl orchestrating the release of resources which reference the tapdev block device.> Ian.Will look forward to your thoughts. Have a good weekend. }-- End of excerpt from Ian Campbell As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "According to the philosopher Ly Tin Wheedle, chaos is found in greatest abundance wherever order is being sought. It always defeats order, because it is better organized." -- Terry Pratchett Interesting Times
On Sat, 2012-04-07 at 16:59 +0100, Dr. Greg Wettstein wrote:> On Mar 30, 12:23pm, Ian Campbell wrote: > } Subject: Re: [Xen-devel] Basic blktap2 functionality issues. > > Hi, hope the weekend is going well for everyone. > > Sorry for the delay in getting back to everyone on this, had a > deadline on another project. > > > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote: > > > I think an approach worth trying would be to have > > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before > > > doing the actual detach. i.e. it would respond with "Yes, I will do > > > that" rather than "Yes, I have done that". My speculation is that this > > > will allow libxl to continue and hopefully avoid the deadlock. > > > This seems to be the case as the following fixes things for > > me. Thanks very much for your analysis which lead me to this > > solution... > > I ported your fix into 4.1.2 but I think we still have a problem, at > least in this codebase. > > I no longer see the select timeout delay when xl shuts down but upon > shutdown the minor number is not freed. A ''tap-ctl list'' shows a > steadily increasing set of orphaned minor numbers as VM''s are started > up and shutdown. > > Are you seeing this in your development codebase?It turns out that I am, yes. e.g. after starting/stopping a guest 3 times: # tap-ctl list - 0 - - - - 1 - - - - 2 - - -> The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in > tap_ctl_free(). The underlying reason for the ioctl failure is the > check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()drivers/*xen*/... right? Or do you have a different blktap to me?> for whether or not the task_struct pointer in the blktap_ring > structure has been NULLed. > > Which certainly makes sense since there is a race between xl''s call to > tap_ctl_free() and tapdisk2 getting to the point where it can close > its descriptor to the blktap instance and thus invoke the .release > method which translates into a call to blktap_ring_release() which > NULL''s the task_struct pointer.Sounds right, but then you would expect both the ioctl and release path to cleanup, depending on who loses the race? There also seems to be a BLKTAP_SHUTDOWN_REQUESTED bit which looks like it is involved somehow... We''ve gotten way beyond my understanding of blktap internals though.> If you are not seeing the orphan minor numbers there must be ordering > changes in the unstable version of xl which eliminate or alters the > race timing. > > For the sake of completeness of information for this thread I captured > the following stack trace of a tapdisk2 when it is deadlocked against > xl:[... thanks ...]> Let me know if you are seeing the issues I''m seeing, in the meantime I > will keep hunting to see if I can rundown the ultimate cause of the > deadlock. Given the above trace it has to be an issue with xl > orchestrating the release of resources which reference the tapdev > block device.I''m not convinced that the shutdown stuff on the kernel side isn''t either horribly broken or at best fragile (perhaps xl just tickles it differently to xend). Please do keep hunting and let us know what you find... Thanks, Ian
On Apr 10, 10:04am, Ian Campbell wrote: } Subject: Re: [Xen-devel] Basic blktap2 functionality issues.> On Sat, 2012-04-07 at 16:59 +0100, Dr. Greg Wettstein wrote: > > On Mar 30, 12:23pm, Ian Campbell wrote: > > } Subject: Re: [Xen-devel] Basic blktap2 functionality issues.> > > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote: > > > > I think an approach worth trying would be to have > > > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before > > > > doing the actual detach. i.e. it would respond with "Yes, I will do > > > > that" rather than "Yes, I have done that". My speculation is that this > > > > will allow libxl to continue and hopefully avoid the deadlock. > > > > > This seems to be the case as the following fixes things for > > > me. Thanks very much for your analysis which lead me to this > > > solution... > > > > I ported your fix into 4.1.2 but I think we still have a problem, at > > least in this codebase. > > > > I no longer see the select timeout delay when xl shuts down but upon > > shutdown the minor number is not freed. A ''tap-ctl list'' shows a > > steadily increasing set of orphaned minor numbers as VM''s are started > > up and shutdown. > > > > Are you seeing this in your development codebase?> It turns out that I am, yes. e.g. after starting/stopping a guest 3 > times: > # tap-ctl list > - 0 - - - > - 1 - - - > - 2 - - -Yes, that is the same thing I''m seeing and would be expected.> > The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in > > tap_ctl_free(). The underlying reason for the ioctl failure is the > > check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()> drivers/*xen*/... right? Or do you have a different blktap to me?I believe the ''drivers/*xen*'' was an old location. You may have seen the note I sent to Tim Wood yesterday. We isolated the blktap2 code from the last GIT tree which Dan Stodden had available and are carrying it as a standalone patch. The version we are testing against is available from the following location: ftp://ftp.enjellic.com/pub/xen/blktap2-3.2.patch Dan had the code in the following location: drivers/block/blktap> > for whether or not the task_struct pointer in the blktap_ring > > structure has been NULLed. > > > > Which certainly makes sense since there is a race between xl''s call to > > tap_ctl_free() and tapdisk2 getting to the point where it can close > > its descriptor to the blktap instance and thus invoke the .release > > method which translates into a call to blktap_ring_release() which > > NULL''s the task_struct pointer.> Sounds right, but then you would expect both the ioctl and release > path to cleanup, depending on who loses the race?The problem is the ioctl path can''t clean up properly since it is blocked from doing so by the check for a valid task_struct pointer. The tapdisk2 side cleans up properly but with the ''acknowledge the detach and then detach patch'' the xl side is always guaranteed to win the race. So the patch moved the failure one ladder step downwards in: [xensrc]/tools/blktap2/control/tap-ctl-destroy.c:tap_control_destroy() But the effective result is the same, modulo the elimination of the ''hang'' while the select call times out.> There also seems to be a BLKTAP_SHUTDOWN_REQUESTED bit which looks like > it is involved somehow... > > We''ve gotten way beyond my understanding of blktap internals though.That bit is set by the kernel kobject infrastructure as part of the teardown of the block device and I believe is implicated to the extent that the xl<->tapdisk2 deadlock is caused by a reference still being held to the block device when the the detach message is sent to tapdisk2. I can now reproduce the deadlock with a tap-ctl recipe which should help in chasing things down. CAUTION: *** the following will livelock your kernel, you will need to have a shell available to force an unmount of the tapdev device. *** 1.) Create aio based device with tap-ctl. 2.) Mount tapdev device. 3.) Close device with tap-ctl. 4.) Send detach message with tap-ctl. * Livelock * 5.) Kill tapdisk2 process. 6.) Unmount tapdev - kernel issues WARNING from fs/buffer.c. Having this recipe should at least help run down which lock we are wedging on.> > Let me know if you are seeing the issues I''m seeing, in the meantime I > > will keep hunting to see if I can rundown the ultimate cause of the > > deadlock. Given the above trace it has to be an issue with xl > > orchestrating the release of resources which reference the tapdev > > block device.> I''m not convinced that the shutdown stuff on the kernel side isn''t > either horribly broken or at best fragile (perhaps xl just tickles > it differently to xend).I don''t believe its fragile as much as the classic problem of having the kernel dependent on userspace. I think the underlying problem is the device release order between xend and xl. I believe xend is releasing all references to the tapdev device before invoking userspace cleanup.> Please do keep hunting and let us know what you find...I will wade a bit deeper into xl to see if I can isolate the ordering problem.> Thanks, > IanThanks for the verification from your end on the problem. Have a good day. Greg }-- End of excerpt from Ian Campbell As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Inspite of all evidence to the contrary, the entire universe is composed of only two basic substances: magic and bullshit." -- Ian Macdonald
On Apr 2, 4:51pm, Ian Jackson wrote: } Subject: Re: [Xen-devel] Basic blktap2 functionality issues. Hi Ian, et. al, hope this note finds your day going well.> Ian Campbell writes ("Re: [Xen-devel] Basic blktap2 functionality issues."): > > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote: > > > I think an approach worth trying would be to have > > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before > > > doing the actual detach. i.e. it would respond with "Yes, I will do > > > that" rather than "Yes, I have done that". My speculation is that this > > > will allow libxl to continue and hopefully avoid the deadlock. > > > > This seems to be the case as the following fixes things for me. Thanks > > very much for your analysis which lead me to this solution...> Greg, can you confirm whether this works for you ?I''ve been grinding through the blktap2 issues and came up with some additional findings I wanted to bounce off everyone. The ability to reproduce this problem with the tap-ctl utility helped confirm what the problem ''smelled like'' from the beginning, ie. a hung reference to the tapdev device instance which is ultimately preventing the .release method from completing on the tapdisk2 side of things. I started snooping around in xenstore and I believe I am seeing an issue where xl does not release the the vbd instance in dom0. This causes a steady and persistent increase in the number of ''stale'' vbd instances held in dom0. At a minimum this would suggest that xl is not properly releasing resources properly which may help track the whole issue down. I replaced the tap: directive with a phy: device instance and saw the orphan vbd instance as well so the problem doesn''t seem related to a blktap driver instance. This is with the original patch applied which ''fixed'' the problem with the tapdisk2 process being left alive. I''m going to back out that patch to see if it reproduces with stock 4.1.2. In the meantime let me know if anyone else is seeing this or if anyone remembers a generic problem like this being fixed in xl in unstable.> Ian.Have a good afternoon. }-- End of excerpt from Ian Jackson As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." -- Slashdot