Nathan March
2011-Jul-27 23:29 UTC
[Xen-devel] Creating a vm with a non-existent /dev/mapper/ tap2 device effectively hangs dom0 system
Have an interesting one here, originally found on xen 4.1.0 but just
upgraded to xen 4.1.1 and it''s still here.
Creating a VM with a tap2 device pointed at /dev/mapper/something, when
that device doesn''t exist, causes the tapdisk2 process to go into D
mode
and also manages to take out any process that queries it.
For example, I have /dev/mapper/nathanxenuk1 as a valid disk and
/dev/mapper/test which doesn''t exist. Creating using libvirt:
<opt type="xen">
<name>nathanxenuk1</name>
<devices>
<disk type="file">
<driver name="tap2" cache="default" type="aio"
/>
<source file="/dev/mapper/nathanxenuk1" />
<target dev="xvda" />
</disk>
<disk type="file">
<driver name="tap2" cache="default" type="aio"
/>
<source file="/dev/mapper/test" />
<target dev="xvdc" />
</disk>
<interface type="bridge">
<mac address="00:16:3E:10:00:01" />
<script path="/etc/xen/scripts/vif-bridge" />
<source bridge="vlan91" />
</interface>
</devices>
<memory>4194304</memory>
<os>
<bootloader>/usr/bin/pygrub</bootloader>
<type>linux</type>
</os>
<vcpu>12</vcpu>
</opt>
Results in:
libvirt error code: 11, message: POST operation failed: xend_post: error
from xen daemon: (xend.err "Error creating domain:
(''create'',
''-aaio:/dev/mapper/test'') failed (512 )")
Normal so far and what I''d expect. At this point however doing anything
that queries that tapdisk2 pid in /proc/ will fully hang.
Doing an "strace -f ps auxf &" (Backgrounding so I can keep my
console),
it ends here and I can find the pid causing it:
open("/proc/11327/status", O_RDONLY) = 6
read(6, "Name:\ttapdisk2\nState:\tD (disk sl"..., 1023) = 706
close(6) = 0
open("/proc/11327/cmdline", O_RDONLY) = 6
read(6,
Trying to do almost anything against /proc/11327/ results in a hang, but
I can see the FD''s ok:
ukxen2 ~ # cd /proc/11327
ukxen2 11327 # ls -al &
[2] 12144
ukxen2 11327 # cd fd
ukxen2 fd # ls -al &
[3] 12211
ukxen2 fd # total 0
dr-x------ 2 root root 0 Jul 27 16:24 .
dr-xr-xr-x 7 root root 0 Jul 27 16:16 ..
lrwx------ 1 root root 64 Jul 27 16:27 0 -> /dev/null
lrwx------ 1 root root 64 Jul 27 16:27 1 -> /dev/null
lrwx------ 1 root root 64 Jul 27 16:27 2 -> /dev/null
lrwx------ 1 root root 64 Jul 27 16:27 3 -> socket:[39528]
lrwx------ 1 root root 64 Jul 27 16:27 4 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 Jul 27 16:27 5 -> socket:[39531]
lrwx------ 1 root root 64 Jul 27 16:27 6 -> socket:[40730]
[3]+ Done ls --color=auto -al
ukxen2 fd #
And /proc/11327/status works:
ukxen2 11327 # cat status &
[3] 12236
ukxen2 11327 # Name: tapdisk2
State: D (disk sleep)
Tgid: 11327
Pid: 11327
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
VmPeak: 23056 kB
VmSize: 21644 kB
VmLck: 21640 kB
VmHWM: 3848 kB
VmRSS: 3232 kB
VmData: 364 kB
VmStk: 88 kB
VmExe: 224 kB
VmLib: 2460 kB
VmPTE: 64 kB
Threads: 1
SigQ: 2/6081
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000181000242
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed: 1
Cpus_allowed_list: 0
Mems_allowed: 1
Mems_allowed_list: 0
voluntary_ctxt_switches: 4155
nonvoluntary_ctxt_switches: 3559
Even doing an ls or trying to use tab completion in /proc/11327/ will
result in your proc going into D mode.
Any existing VM''s stay running fine and I can manage them remotely via
libvirt, so only the dom0 is affected.
Any ideas? =)
Thanks,
Nathan
--
Nathan March<nathan@gt.net>
Gossamer Threads Inc. http://www.gossamer-threads.com/
Tel: (604) 687-5804 Fax: (604) 687-5806
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Nathan March
2011-Jul-28 19:00 UTC
Re: [Xen-devel] Creating a vm with a non-existent /dev/mapper/ tap2 device effectively hangs dom0 system
On 7/27/2011 4:29 PM, Nathan March wrote:> Have an interesting one here, originally found on xen 4.1.0 but just > upgraded to xen 4.1.1 and it''s still here. > > Creating a VM with a tap2 device pointed at /dev/mapper/something, > when that device doesn''t exist, causes the tapdisk2 process to go into > D mode and also manages to take out any process that queries it. >This also happens on proper shutdown of a VM, so I must have done something crazy to the setup here since other people haven''t been complaining. If I start a VM, strace it''s tapdisk2 and then send the VM a shutdown, the strace shows tapdisk2 hanging here: 12037 gettimeofday({1311879426, 739622}, NULL) = 0 12037 gettimeofday({1311879426, 739717}, NULL) = 0 12037 select(8, [3 4 7], [], [], {600, 0}) = 1 (in [3], left {599, 993029}) 12037 gettimeofday({1311879426, 746896}, NULL) = 0 12037 accept(3, 0, NULL) = 6 12037 gettimeofday({1311879426, 747079}, NULL) = 0 12037 gettimeofday({1311879426, 747169}, NULL) = 0 12037 gettimeofday({1311879426, 747257}, NULL) = 0 12037 select(8, [3 4 6 7], [], [], {600, 0}) = 1 (in [6], left {599, 999948}) 12037 gettimeofday({1311879426, 747544}, NULL) = 0 12037 select(7, [6], NULL, NULL, {2, 0}) = 1 (in [6], left {1, 999998}) 12037 read(6, "\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 280) = 280 12037 gettimeofday({1311879426, 747932}, NULL) = 0 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: received ''close'' message (uuid = 0)\n", 73, MSG_NOSIGNAL, NULL, 0) = 73 12037 close(8) = 0 12037 gettimeofday({1311879426, 749118}, NULL) = 0 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: closed image /dev/mapper/nathanxenuk1 (0 users, state: 0x00000000, type: 0)\n", 113, MSG_NOSIGNAL, NULL, 0) = 113 12037 gettimeofday({1311879426, 749536}, NULL) = 0 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: sending ''close response'' message (uuid = 0)\n", 81, MSG_NOSIGNAL, NULL, 0) = 81 12037 select(7, NULL, [6], NULL, {2, 0}) = 1 (out [6], left {1, 999998}) 12037 writeclose(6) = 0 12037 gettimeofday({1311879426, 750295}, NULL) = 0 12037 gettimeofday({1311879426, 750384}, NULL) = 0 12037 select(8, [3 4 7], [], [], {600, 0}) = 1 (in [3], left {599, 999936}) 12037 gettimeofday({1311879426, 750690}, NULL) = 0 12037 accept(3, 0, NULL) = 6 12037 gettimeofday({1311879426, 750801}, NULL) = 0 12037 gettimeofday({1311879426, 750854}, NULL) = 0 12037 gettimeofday({1311879426, 750905}, NULL) = 0 12037 select(8, [3 4 6 7], [], [], {600, 0}) = 1 (in [6], left {599, 999946}) 12037 gettimeofday({1311879426, 751085}, NULL) = 0 12037 select(7, [6], NULL, NULL, {2, 0}) = 1 (in [6], left {1, 999998}) 12037 readgettimeofday({1311879426, 751550}, NULL) = 0 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: received ''detach'' message (uuid = 0)\n", 74, MSG_NOSIGNAL, NULL, 0) = 74 12037 close(7) = 0 12037 munmap(0x7ffc389d7000, 1445888 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME tapdisk2 12037 root cwd DIR 8,1 4096 2 / tapdisk2 12037 root rtd DIR 8,1 4096 2 / tapdisk2 12037 root txt REG 8,1 496268 180124 /usr/sbin/tapdisk2 tapdisk2 12037 root mem REG 8,1 1412272 268124 /lib64/libc-2.11.2.so tapdisk2 12037 root mem REG 8,1 534648 267759 /lib64/libm-2.11.2.so tapdisk2 12037 root mem REG 8,1 137732 267539 /lib64/libpthread-2.11.2.so tapdisk2 12037 root mem REG 8,1 14512 267757 /lib64/libdl-2.11.2.so tapdisk2 12037 root mem REG 8,1 164708 180168 /usr/lib64/libxenctrl.so.4.0.0 tapdisk2 12037 root mem REG 8,1 18832 267724 /lib64/libuuid.so.1.3.0 tapdisk2 12037 root mem REG 8,1 410267 180118 /usr/lib64/libvhd.so.1.0.0 tapdisk2 12037 root mem REG 8,1 88368 268110 /lib64/libz.so.1.2.3 tapdisk2 12037 root mem REG 8,1 35656 267750 /lib64/librt-2.11.2.so tapdisk2 12037 root mem REG 8,1 128416 267762 /lib64/ld-2.11.2.so tapdisk2 12037 root mem CHR 251,0 44028 /dev/xen/blktap-2/blktap0 tapdisk2 12037 root 0u CHR 1,3 0t0 1539 /dev/null tapdisk2 12037 root 1u CHR 1,3 0t0 1539 /dev/null tapdisk2 12037 root 2u CHR 1,3 0t0 1539 /dev/null tapdisk2 12037 root 3u unix 0xffff880039c862c0 0t0 44033 /var/run/blktap-control/ctl12037 tapdisk2 12037 root 4u 0000 0,8 0 1000 anon_inode tapdisk2 12037 root 5u unix 0xffff880039cbe840 0t0 44036 socket tapdisk2 12037 root 7u CHR 251,0 0t0 44028 /dev/xen/blktap-2/blktap0 tapdisk2 12037 root 8u BLK 252,0 0t0 36899 /dev/mapper/nathanxenuk1 The /dev/mapper devices are coming from a dell md3200i, using open-iscsi 2.0.871 and multipath-tools-0.4.9-r2. This is using the main xen 4.1.1 release, with jeremy''s git dom0 kernel (2.6.32.43). Anyone have any idea what might be happening here? - Nathan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nathan March
2011-Jul-28 22:13 UTC
Re: [Xen-devel] Creating a vm with a non-existent /dev/mapper/ tap2 device effectively hangs dom0 system
On 7/28/2011 12:00 PM, Nathan March wrote:> On 7/27/2011 4:29 PM, Nathan March wrote: >> Have an interesting one here, originally found on xen 4.1.0 but just >> upgraded to xen 4.1.1 and it''s still here. >> >> Creating a VM with a tap2 device pointed at /dev/mapper/something, >> when that device doesn''t exist, causes the tapdisk2 process to go >> into D mode and also manages to take out any process that queries it. >> > > This also happens on proper shutdown of a VM, so I must have done > something crazy to the setup here since other people haven''t been > complaining. If I start a VM, strace it''s tapdisk2 and then send the > VM a shutdown, the strace shows tapdisk2 hanging here:*sigh*, for anyone who finds this via google, tracked it down to a stupid error. Shouldn''t be using tapdisk2 for accessing an iscsi device since it appears as a raw block device, after changing to using phy: things work properly. - Nathan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-29 16:23 UTC
Re: [Xen-devel] Creating a vm with a non-existent /dev/mapper/ tap2 device effectively hangs dom0 system
On Thu, Jul 28, 2011 at 12:00:53PM -0700, Nathan March wrote:> On 7/27/2011 4:29 PM, Nathan March wrote: > >Have an interesting one here, originally found on xen 4.1.0 but > >just upgraded to xen 4.1.1 and it''s still here. > > > >Creating a VM with a tap2 device pointed at /dev/mapper/something, > >when that device doesn''t exist, causes the tapdisk2 process to go > >into D mode and also manages to take out any process that queries > >it.Daniel, any ideas? [edit: Asked Nathan to pull latest Jermey''s with your blktap fix]> > > > This also happens on proper shutdown of a VM, so I must have done > something crazy to the setup here since other people haven''t been > complaining. If I start a VM, strace it''s tapdisk2 and then send the > VM a shutdown, the strace shows tapdisk2 hanging here: > > 12037 gettimeofday({1311879426, 739622}, NULL) = 0 > 12037 gettimeofday({1311879426, 739717}, NULL) = 0 > 12037 select(8, [3 4 7], [], [], {600, 0}) = 1 (in [3], left {599, 993029}) > 12037 gettimeofday({1311879426, 746896}, NULL) = 0 > 12037 accept(3, 0, NULL) = 6 > 12037 gettimeofday({1311879426, 747079}, NULL) = 0 > 12037 gettimeofday({1311879426, 747169}, NULL) = 0 > 12037 gettimeofday({1311879426, 747257}, NULL) = 0 > 12037 select(8, [3 4 6 7], [], [], {600, 0}) = 1 (in [6], left {599, > 999948}) > 12037 gettimeofday({1311879426, 747544}, NULL) = 0 > 12037 select(7, [6], NULL, NULL, {2, 0}) = 1 (in [6], left {1, 999998}) > 12037 read(6, "\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", > 280) = 280 > 12037 gettimeofday({1311879426, 747932}, NULL) = 0 > 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: received > ''close'' message (uuid = 0)\n", 73, MSG_NOSIGNAL, NULL, 0) = 73 > 12037 close(8) = 0 > 12037 gettimeofday({1311879426, 749118}, NULL) = 0 > 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: closed image > /dev/mapper/nathanxenuk1 (0 users, state: 0x00000000, type: 0)\n", > 113, MSG_NOSIGNAL, NULL, 0) = 113 > 12037 gettimeofday({1311879426, 749536}, NULL) = 0 > 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: sending ''close > response'' message (uuid = 0)\n", 81, MSG_NOSIGNAL, NULL, 0) = 81 > 12037 select(7, NULL, [6], NULL, {2, 0}) = 1 (out [6], left {1, 999998}) > 12037 writeclose(6) = 0 > 12037 gettimeofday({1311879426, 750295}, NULL) = 0 > 12037 gettimeofday({1311879426, 750384}, NULL) = 0 > 12037 select(8, [3 4 7], [], [], {600, 0}) = 1 (in [3], left {599, 999936}) > 12037 gettimeofday({1311879426, 750690}, NULL) = 0 > 12037 accept(3, 0, NULL) = 6 > 12037 gettimeofday({1311879426, 750801}, NULL) = 0 > 12037 gettimeofday({1311879426, 750854}, NULL) = 0 > 12037 gettimeofday({1311879426, 750905}, NULL) = 0 > 12037 select(8, [3 4 6 7], [], [], {600, 0}) = 1 (in [6], left {599, > 999946}) > 12037 gettimeofday({1311879426, 751085}, NULL) = 0 > 12037 select(7, [6], NULL, NULL, {2, 0}) = 1 (in [6], left {1, 999998}) > 12037 readgettimeofday({1311879426, 751550}, NULL) = 0 > 12037 sendto(5, "<30>Jul 28 11:57:06 tapdisk2[12036]: received > ''detach'' message (uuid = 0)\n", 74, MSG_NOSIGNAL, NULL, 0) = 74 > 12037 close(7) = 0 > 12037 munmap(0x7ffc389d7000, 1445888 > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > tapdisk2 12037 root cwd DIR 8,1 4096 2 / > tapdisk2 12037 root rtd DIR 8,1 4096 2 / > tapdisk2 12037 root txt REG 8,1 496268 180124 > /usr/sbin/tapdisk2 > tapdisk2 12037 root mem REG 8,1 1412272 268124 > /lib64/libc-2.11.2.so > tapdisk2 12037 root mem REG 8,1 534648 267759 > /lib64/libm-2.11.2.so > tapdisk2 12037 root mem REG 8,1 137732 267539 > /lib64/libpthread-2.11.2.so > tapdisk2 12037 root mem REG 8,1 14512 267757 > /lib64/libdl-2.11.2.so > tapdisk2 12037 root mem REG 8,1 164708 180168 > /usr/lib64/libxenctrl.so.4.0.0 > tapdisk2 12037 root mem REG 8,1 18832 267724 > /lib64/libuuid.so.1.3.0 > tapdisk2 12037 root mem REG 8,1 410267 180118 > /usr/lib64/libvhd.so.1.0.0 > tapdisk2 12037 root mem REG 8,1 88368 268110 > /lib64/libz.so.1.2.3 > tapdisk2 12037 root mem REG 8,1 35656 267750 > /lib64/librt-2.11.2.so > tapdisk2 12037 root mem REG 8,1 128416 267762 > /lib64/ld-2.11.2.so > tapdisk2 12037 root mem CHR 251,0 44028 > /dev/xen/blktap-2/blktap0 > tapdisk2 12037 root 0u CHR 1,3 0t0 1539 /dev/null > tapdisk2 12037 root 1u CHR 1,3 0t0 1539 /dev/null > tapdisk2 12037 root 2u CHR 1,3 0t0 1539 /dev/null > tapdisk2 12037 root 3u unix 0xffff880039c862c0 0t0 44033 > /var/run/blktap-control/ctl12037 > tapdisk2 12037 root 4u 0000 0,8 0 1000 > anon_inode > tapdisk2 12037 root 5u unix 0xffff880039cbe840 0t0 44036 socket > tapdisk2 12037 root 7u CHR 251,0 0t0 44028 > /dev/xen/blktap-2/blktap0 > tapdisk2 12037 root 8u BLK 252,0 0t0 36899 > /dev/mapper/nathanxenuk1 > > The /dev/mapper devices are coming from a dell md3200i, using > open-iscsi 2.0.871 and multipath-tools-0.4.9-r2. > > This is using the main xen 4.1.1 release, with jeremy''s git dom0 > kernel (2.6.32.43).Oh, wait. Did you update it to the latest Jeremy pulled in a blktap fix.> > Anyone have any idea what might be happening here? > > - Nathan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nathan March
2011-Jul-29 16:26 UTC
Re: [Xen-devel] Creating a vm with a non-existent /dev/mapper/ tap2 device effectively hangs dom0 system
On 7/29/2011 9:23 AM, Konrad Rzeszutek Wilk wrote:>> The /dev/mapper devices are coming from a dell md3200i, using >> open-iscsi 2.0.871 and multipath-tools-0.4.9-r2. >> >> This is using the main xen 4.1.1 release, with jeremy''s git dom0 >> kernel (2.6.32.43). > Oh, wait. Did you update it to the latest Jeremy pulled in a blktap fix. >Not sure if you saw my latest response, this was resolved by switching from blktap2 to phy (first time using iscsi, usually use disk images). Probably not desired behavior still though. Blktap2 seemed to work fine, with the exception of the hang on shutdown. (Performance benchmarks seem to indicate phy and blktap2 are equal as well) - Nathan -- Nathan March<nathan@gt.net> Gossamer Threads Inc. http://www.gossamer-threads.com/ Tel: (604) 687-5804 Fax: (604) 687-5806 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel