Andreas Olsowski
2011-Jul-22 09:18 UTC
[Xen-devel] hanging tapdisk2 processes and improper udev rules
When i xl-create a guest, i get one message per assigned block device: root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp Parsing config file /etc/xen/domains/x1test.sxp Daemon running with PID 8704 root@xenturio1:/var/log# tail -10 error |grep SYMLINK syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name ''blktap2'' and NAME= ''xen/blktap-2/blktap2'' disagree, please use SYMLINK+= or change the kernel to provide the proper name syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name ''blktap3'' and NAME= ''xen/blktap-2/blktap3'' disagree, please use SYMLINK+= or change the kernel to provide the proper name The guest works fine at that point. root 8975 1.0 0.0 21664 3292 ? SLs 11:00 0:00 tapdisk2 root 8978 0.0 0.0 21008 916 ? S 11:00 0:00 udevd --daemon root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd --daemon root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd --daemon root 9020 0.0 0.0 35500 952 ? Ssl 11:00 0:00 xl create /etc/xen/domains/x1test2.sxp root 9067 0.0 0.0 0 0 ? S 11:00 0:00 [blkback.3.xvda1] root 9068 0.0 0.0 0 0 ? S 11:00 0:00 [blkback.3.xvda2] Then i shutdown the guest: root@xenturio1:/var/log# xl shutdown x1test And i am left with remaining tapdisk2 and udev processes, one for each block device that was assigned to the guest: root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd --daemon root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd --daemon I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy. My distro is debian 6.0.2. that uses udev 164-3. I did update it on a different machine to 171-3, but that did not help. My xen-backend.rules contains the default: SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", MODE="0600" SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", MODE="0600 My questions are: - Are the two issues related? - How can i fix them? I think that eventually this will cause the host to run out of either free process IDs and/or RAM. -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jul-22 09:28 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On Fri, 2011-07-22 at 10:18 +0100, Andreas Olsowski wrote:> When i xl-create a guest, i get one message per assigned block device: > > root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp > Parsing config file /etc/xen/domains/x1test.sxp > Daemon running with PID 8704 > > root@xenturio1:/var/log# tail -10 error |grep SYMLINK > syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name > ''blktap2'' and NAME= ''xen/blktap-2/blktap2'' disagree, please use > SYMLINK+= or change the kernel to provide the proper name > syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name > ''blktap3'' and NAME= ''xen/blktap-2/blktap3'' disagree, please use > SYMLINK+= or change the kernel to provide the proper nameThis is because udev and forward/backward compatibility are strangers passing in the night. I presume if you make the recommended change to SYMLINK+= instead of NAME= in your udev script this goes away?> Then i shutdown the guest: > root@xenturio1:/var/log# xl shutdown x1test > > And i am left with remaining tapdisk2 and udev processes, one for each > block device that was assigned to the guest: > root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd > --daemon > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd > --daemonI posted a patch to fix this "libxl: attempt to cleanup tapdisk processes on disk backend destroy" a couple of times, most recently at http://marc.info/?l=xen-devel&m=131066210526755 but it hasn''t been applied yet. Can you try it?> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy. > My distro is debian 6.0.2. that uses udev 164-3. > I did update it on a different machine to 171-3, but that did not help. > > > My xen-backend.rules contains the default: > SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", > MODE="0600" > SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", > MODE="0600 > > > My questions are: > - Are the two issues related? > - How can i fix them? > > > I think that eventually this will cause the host to run out of either > free process IDs and/or RAM. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2011-Jul-22 09:31 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On Fri, 2011-07-22 at 05:18 -0400, Andreas Olsowski wrote:> When i xl-create a guest, i get one message per assigned block device: > > root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp > Parsing config file /etc/xen/domains/x1test.sxp > Daemon running with PID 8704Can you try if it gets better when removing that file? Thanks, Daniel> root@xenturio1:/var/log# tail -10 error |grep SYMLINK > syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name > ''blktap2'' and NAME= ''xen/blktap-2/blktap2'' disagree, please use > SYMLINK+= or change the kernel to provide the proper name > syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name > ''blktap3'' and NAME= ''xen/blktap-2/blktap3'' disagree, please use > SYMLINK+= or change the kernel to provide the proper name > > > The guest works fine at that point. > root 8975 1.0 0.0 21664 3292 ? SLs 11:00 0:00 tapdisk2 > root 8978 0.0 0.0 21008 916 ? S 11:00 0:00 udevd > --daemon > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd > --daemon > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd > --daemon > root 9020 0.0 0.0 35500 952 ? Ssl 11:00 0:00 xl > create /etc/xen/domains/x1test2.sxp > root 9067 0.0 0.0 0 0 ? S 11:00 0:00 > [blkback.3.xvda1] > root 9068 0.0 0.0 0 0 ? S 11:00 0:00 > [blkback.3.xvda2] > > > > Then i shutdown the guest: > root@xenturio1:/var/log# xl shutdown x1test > > And i am left with remaining tapdisk2 and udev processes, one for each > block device that was assigned to the guest: > root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd > --daemon > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd > --daemon > > I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy. > My distro is debian 6.0.2. that uses udev 164-3. > I did update it on a different machine to 171-3, but that did not help. > > > My xen-backend.rules contains the default: > SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", > MODE="0600" > SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", > MODE="0600 > > > My questions are: > - Are the two issues related? > - How can i fix them? > > > I think that eventually this will cause the host to run out of either > free process IDs and/or RAM. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2011-Jul-22 09:32 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On Fri, 2011-07-22 at 05:31 -0400, Daniel Stodden wrote:> On Fri, 2011-07-22 at 05:18 -0400, Andreas Olsowski wrote: > > When i xl-create a guest, i get one message per assigned block device: > > > > root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp > > Parsing config file /etc/xen/domains/x1test.sxp > > Daemon running with PID 8704 > > Can you try if it gets better when removing that file?The udev rules, in case it isn''t clear :) Daniel> Thanks, > Daniel > > > root@xenturio1:/var/log# tail -10 error |grep SYMLINK > > syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name > > ''blktap2'' and NAME= ''xen/blktap-2/blktap2'' disagree, please use > > SYMLINK+= or change the kernel to provide the proper name > > syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name > > ''blktap3'' and NAME= ''xen/blktap-2/blktap3'' disagree, please use > > SYMLINK+= or change the kernel to provide the proper name > > > > > > The guest works fine at that point. > > root 8975 1.0 0.0 21664 3292 ? SLs 11:00 0:00 tapdisk2 > > root 8978 0.0 0.0 21008 916 ? S 11:00 0:00 udevd > > --daemon > > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd > > --daemon > > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd > > --daemon > > root 9020 0.0 0.0 35500 952 ? Ssl 11:00 0:00 xl > > create /etc/xen/domains/x1test2.sxp > > root 9067 0.0 0.0 0 0 ? S 11:00 0:00 > > [blkback.3.xvda1] > > root 9068 0.0 0.0 0 0 ? S 11:00 0:00 > > [blkback.3.xvda2] > > > > > > > > Then i shutdown the guest: > > root@xenturio1:/var/log# xl shutdown x1test > > > > And i am left with remaining tapdisk2 and udev processes, one for each > > block device that was assigned to the guest: > > root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2 > > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd > > --daemon > > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd > > --daemon > > > > I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy. > > My distro is debian 6.0.2. that uses udev 164-3. > > I did update it on a different machine to 171-3, but that did not help. > > > > > > My xen-backend.rules contains the default: > > SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", > > MODE="0600" > > SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", > > MODE="0600 > > > > > > My questions are: > > - Are the two issues related? > > - How can i fix them? > > > > > > I think that eventually this will cause the host to run out of either > > free process IDs and/or RAM. > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sébastien RICCIO
2011-Jul-22 09:34 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
>> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy. >> My distro is debian 6.0.2. that uses udev 164-3. >> I did update it on a different machine to 171-3, but that did not help. >>Hi, Just for curiosity, are you running multipathd on that box ? I had (still have in fact) an issue with tapdisk processes hanging while multipathd process running. Sébastien _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2011-Jul-22 09:50 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On Fri, 2011-07-22 at 05:34 -0400, Sébastien RICCIO wrote:> >> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy. > >> My distro is debian 6.0.2. that uses udev 164-3. > >> I did update it on a different machine to 171-3, but that did not help. > >> > Hi, > Just for curiosity, are you running multipathd on that box ? I had > (still have in fact) an issue with tapdisk processes hanging > while multipathd process running.The processes, really? Where do they hang? (check out the wait state -- ps -eopid,wchan:25,cmd or so). Or do you mean they''re stuck waiting for I/Os? Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sébastien RICCIO
2011-Jul-22 10:01 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
> The processes, really? Where do they hang? (check out the wait state -- > ps -eopid,wchan:25,cmd or so). > > Or do you mean they''re stuck waiting for I/Os? > > Daniel > >They seems to work and to do their job, but they are in a strange state. For example a ps -aux on dom0 hangs when processing the line about the tapdisk process, also it cannot be detached from the vm, and issuing a reboot of the host hangs too (can''t kill the process so it doesn''t reboot). I fighted quite a lot with this on a debian6 + xen 4.1.x box and found out that disabling the multipath-tools and multipath-tools-boot corrected the problem (but I need them). I thought that maybe it was beacause multipathd try to "multipath" the block device handled by blktap2 and somehow locks it. But it''s speculations :) I do not have the the hands on the box at the moment to give you more informations and do not want to hijack this thread. It''s just that it looked like the problem I encountered, but I will send you more informations when I am on the box. Thanks, Sébastien _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Jul-22 11:36 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On 07/22/2011 11:28 AM, Ian Campbell wrote:> This is because udev and forward/backward compatibility are strangers > passing in the night. I presume if you make the recommended change to > SYMLINK+= instead of NAME= in your udev script this goes away?You assume correctly.> I posted a patch to fix this "libxl: attempt to cleanup tapdisk > processes on disk backend destroy" a couple of times, most recently at > http://marc.info/?l=xen-devel&m=131066210526755 but it hasn''t been > applied yet. Can you try it?I tried it: make -j7 tools: ... libxl_device.c: In function ‘libxl__device_destroy’: libxl_device.c:253: error: incompatible type for argument 1 of ‘libxl__device_destroy_tapdisk’ libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument is of type ‘libxl__gc’ libxl_device.c:274: error: incompatible type for argument 1 of ‘libxl__device_destroy_tapdisk’ libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument is of type ‘libxl__gc’ My expertise with C is barely existant, but i took a look at tools/libxl/libxl_device.c and changed your libxl__device_destroy_tapdisk(gc, be_path); into libxl__device_destroy_tapdisk(&gc, be_path); as i have seen some &gc on other lines of code. And it compiled. I then created a guest, shut it down. First it kept beeing in a -ps--- state, i wanted to take a look at the runing processes with "ps auxww" but the ps process hung itself. I could no longer run "ps" successfully after this point. syslog showed: ul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_read_message: failure reading message Jul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_send_and_receive: failed to receive ''unknown'' message Either my hack to get your code to compile was no good or your patch has some unforseen side effects. I have now rebooted the server. As i went on to check if multipath had any effect on it i added devnode "^td" to the blacklist. Now when i xl create a vm it only boots up to a certain point and then does nothing. If that certain point were to be the login prompt everything would be fine, but it isnt: http://pastebin.com/Lmie6KwY This is how it should look like: http://pastebin.com/CsgYypbk I will try to backtrace my steps and see what i did do to break my system. In the meantime i have other systems i can test stuff on. -with best regards -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jul-22 14:07 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On Fri, 2011-07-22 at 12:36 +0100, Andreas Olsowski wrote:> On 07/22/2011 11:28 AM, Ian Campbell wrote: > > > This is because udev and forward/backward compatibility are strangers > > passing in the night. I presume if you make the recommended change to > > SYMLINK+= instead of NAME= in your udev script this goes away? > You assume correctly. > > > I posted a patch to fix this "libxl: attempt to cleanup tapdisk > > processes on disk backend destroy" a couple of times, most recently at > > http://marc.info/?l=xen-devel&m=131066210526755 but it hasn''t been > > applied yet. Can you try it? > > I tried it: > > make -j7 tools: > ... > libxl_device.c: In function ‘libxl__device_destroy’: > libxl_device.c:253: error: incompatible type for argument 1 of > ‘libxl__device_destroy_tapdisk’ > libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument > is of type ‘libxl__gc’ > libxl_device.c:274: error: incompatible type for argument 1 of > ‘libxl__device_destroy_tapdisk’ > libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument > is of type ‘libxl__gc’ > > My expertise with C is barely existant, but i took a look at > tools/libxl/libxl_device.c > > and changed your > libxl__device_destroy_tapdisk(gc, be_path); > into > libxl__device_destroy_tapdisk(&gc, be_path); > > as i have seen some &gc on other lines of code.That looks right. I think this is just a difference between current xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).> And it compiled. > > I then created a guest, shut it down. > First it kept beeing in a -ps--- state, i wanted to take a look at the > runing processes with "ps auxww" but the ps process hung itself. > I could no longer run "ps" successfully after this point.Uh. That really shouldn''t happen :-/ In fact baring a bug in the host OS itself I''m not sure how ps can ever get into that state...> syslog showed: > ul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_read_message: failure > reading message > Jul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_send_and_receive: failed > to receive ''unknown'' message > > Either my hack to get your code to compile was no good or your patch has > some unforseen side effects.It''s possible that it relies on something in xen-unstable that I''m not aware of. Would it be possible for you to try and repro this issue with xen-unstable.hg and this patch? Daniel, have you got any idea what might be going on here? Ian.> > > > I have now rebooted the server. > > > As i went on to check if multipath had any effect on it i added > devnode "^td" to the blacklist. > > Now when i xl create a vm it only boots up to a certain point and then > does nothing. > If that certain point were to be the login prompt everything would be > fine, but it isnt: > http://pastebin.com/Lmie6KwY > > This is how it should look like: > > http://pastebin.com/CsgYypbk > > I will try to backtrace my steps and see what i did do to break my system. > > In the meantime i have other systems i can test stuff on. > > > > -with best regards > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Jul-22 14:28 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
>> My expertise with C is barely existant, but i took a look at >> tools/libxl/libxl_device.c >> >> and changed your >> libxl__device_destroy_tapdisk(gc, be_path); >> into >> libxl__device_destroy_tapdisk(&gc, be_path); >> >> as i have seen some&gc on other lines of code. > > That looks right. I think this is just a difference between current > xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).What do you mean looks right, the compilation errors or my shot-in-the-dark adjustment?> Uh. That really shouldn''t happen :-/ In fact baring a bug in the host OS > itself I''m not sure how ps can ever get into that state...I had this happen before on two occasions (one of them using xm to create a guest, whereas xl worked fine) and Sébastien Riccio wrote in this thread, that he encountered it too. If this one returns during "normal operation", ill write some more.> It''s possible that it relies on something in xen-unstable that I''m not > aware of. Would it be possible for you to try and repro this issue with > xen-unstable.hg and this patch?Yes, i can and will do that. Probably later this evening (4PM here now), but definitely this weekend. I will reply to this thread with the results. -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jul-22 14:32 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
On Fri, 2011-07-22 at 15:28 +0100, Andreas Olsowski wrote:> >> My expertise with C is barely existant, but i took a look at > >> tools/libxl/libxl_device.c > >> > >> and changed your > >> libxl__device_destroy_tapdisk(gc, be_path); > >> into > >> libxl__device_destroy_tapdisk(&gc, be_path); > >> > >> as i have seen some&gc on other lines of code. > > > > That looks right. I think this is just a difference between current > > xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW). > What do you mean looks right, the compilation errors or my > shot-in-the-dark adjustment?Your fix looked sensible.> > > Uh. That really shouldn''t happen :-/ In fact baring a bug in the host OS > > itself I''m not sure how ps can ever get into that state... > I had this happen before on two occasions (one of them using xm to > create a guest, whereas xl worked fine) and Sébastien Riccio wrote in > this thread, that he encountered it too. > If this one returns during "normal operation", ill write some more. > > > It''s possible that it relies on something in xen-unstable that I''m not > > aware of. Would it be possible for you to try and repro this issue with > > xen-unstable.hg and this patch? > Yes, i can and will do that. > Probably later this evening (4PM here now), but definitely this weekend. > I will reply to this thread with the results.Thanks. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2011-Jul-22 18:55 UTC
Re: [Xen-devel] hanging tapdisk2 processes and multipathing
On Fri, 2011-07-22 at 06:01 -0400, Sébastien RICCIO wrote:> > The processes, really? Where do they hang? (check out the wait state -- > > ps -eopid,wchan:25,cmd or so). > > > > Or do you mean they''re stuck waiting for I/Os? > > > > Daniel > > > > > > They seems to work and to do their job, but they are in a strange state. > For example a ps -aux on dom0 hangs when processing > the line about the tapdisk process, also it cannot be detached from the > vm, and issuing a reboot of the host hangs too (can''t kill the process > so it doesn''t reboot). > > I fighted quite a lot with this on a debian6 + xen 4.1.x box and found > out that disabling the multipath-tools and multipath-tools-boot > corrected the problem (but I need them). I thought that maybe it was > beacause multipathd try to "multipath" the block device > handled by blktap2 and somehow locks it. But it''s speculations :)The multipathing is in a dm node to which tapdisk issues I/O. There''s no special handling involved in there whatsoever. It''s completely transparent, to blktap and tapdisk, as it should be. I could imagine tapdisk wedging in dm code, during some I/O operations. These should be fully asynchronous, but for some storage types under special conditions that''s sometimes wishful thinking. That applies if you find a tap-ctl call (even just a list command) blocking. The blktap module does not do anything unusual to the tapdisk task. Anyway, it''d initially be a matter of figuring out where exactly it blocks. If ps is borked, try to get another shell and cat /proc/<pid>/wchan. Makes sense with both the ps and tapdisk2 tasks. You say from the guest I/O perspective it still makes progress? If not, that would explain why you''re unable to detach: Blkback won''t be able to release the device before all pending I/O is flushed. To check tapdev I/O state from the host side, do a cat /sys/class/blktap2/tapdisk<n>/debug That will dump some task stuff and a list of outstanding requests, if there are any.> I do not have the the hands on the box at the moment to give you more > informations and do not want to hijack this thread. It''s just that it > looked like the problem I encountered, but I will send you more > informations when I am on the box.Thanks! Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Jul-25 08:55 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
Well i did some testing this morning as my VPN connection was borked all weekend. __xen-unstable does not leave any tapdisk processes running.__ In fact it would seem that tapdisk is only started to spawn the block device and then ends. I may be misreading normal behavior here: However the udevd processes that are started when a guest is created will stick around even if the guest is shut down but will be replaced with different udevd processes for the next created guest. Nevertheless i applied you patch an tried again. That gc &gc fix wasnt neccessary to patch. The patch had no visible effect. ´ I hope this info helps in creating a patch for 4.1 -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Aug-11 13:32 UTC
Re: [Xen-devel] hanging tapdisk2 processes and improper udev rules
Hi i was wondering if something has happened in the last weeks regarding this issue. For now i am using xen 4.2 that either already has some kind of patch applied or does not need one. With best regards -- Andreas Olsowski Leuphana Universität Lüneburg Rechen- und Medienzentrum Scharnhorststraße 1, C7.015 21335 Lüneburg Tel: ++49 4131 677 1309 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel