I have run into a strange situation where a domain will not boot with a certain disk specified in the config file and trying to block-attach it after it starts results in the domain disappearing from the list and presumably simply crashing. I am running CentOS 5.4 with kernel 2.6.18-160.el5xen x86_64 For months everything worked perfectly with with these domains using an AoE SAN for the back-end. I have used this sort of setup for several years and it is great. But these domains in particular have been running for several months. Then 3 of the 4 domU''s I run were really heavily slammed and became unresponsive and I ended up having to do an xm destroy on them. After that they refuse to come back up. One of my domU''s has not been rebooted and it continues to work great with all 4 disk devices attached. Here is my domU config file: name = "db2" uuid = "f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff" maxmem = 16384 memory = 2048 vcpus = 4 bootloader = "/usr/bin/pygrub" on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ ] disk = [ "phy:/dev/etherd/e1.12,xvda,w", "phy:/dev/etherd/e2.12,xvdb,w", "phy:/dev/etherd/e3.1,xvdc,w", "phy:/dev/etherd/e4.1,xvdd,w" ] vif = [ "mac=00:16:3e:5b:5c:dd,bridge=dmz" ] If I boot the domU with this config file I get the following on boot: Red Hat nash version 5.1.19.6 starting Mounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Setting up hotplug. Creating block device nodes. Loading ehci-hcd.ko module Loading ohci-hcd.ko module Loading uhci-hcd.ko module USB Universal Host Controller Interface driver v3.0 Loading jbd.ko module Loading ext3.ko module Loading raid1.ko module md: raid1 personality registered for level 1 Loading xenblk.ko module Registering block device major 202 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 > xvdb: xvdb1 xvdb2 xvdb3 xvdb4 < xvdb5 > xvdc: xvdc1 kobject_add failed for xvda with -EEXIST, don''t try to register things with the same name in the same directory. Call Trace: [<ffffffff803404ea>] kobject_add+0x170/0x19b [<ffffffff8025cfd5>] exact_lock+0x0/0x14 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff802fb4e2>] register_disk+0x43/0x190 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80336c3a>] add_disk+0x34/0x3d [<ffffffff88084ec9>] :xenblk:backend_changed+0x110/0x193 [<ffffffff803b32fa>] xenbus_read_driver_state+0x26/0x3b [<ffffffff803b4bdb>] xenwatch_thread+0x0/0x135 [<ffffffff803b402d>] xenwatch_handle_callback+0x15/0x48 [<ffffffff803b4cf7>] xenwatch_thread+0x11c/0x135 [<ffffffff8029bb44>] autoremove_wake_function+0x0/0x2e [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233bcd>] kthread+0xfe/0x132 [<ffffffff80260b2c>] child_rip+0xa/0x12 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233acf>] kthread+0x0/0x132 [<ffffffff80260b22>] child_rip+0x0/0x12 Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: [<ffffffff802fe512>] create_dir+0x11/0x1cf PGD 7f1c9067 PUD 7f1ca067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /block/ram0/dev CPU 1 Modules linked in: xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 9, comm: xenwatch Not tainted 2.6.18-164.el5xen #1 RIP: e030:[<ffffffff802fe512>] [<ffffffff802fe512>] create_dir+0x11/0x1cf RSP: e02b:ffff880000fbfda0 EFLAGS: 00010282 RAX: ffff88007f31b870 RBX: ffff88007f3cd4f0 RCX: ffff880000fbfdd8 RDX: ffff88007f3cd4f8 RSI: 0000000000000000 RDI: ffff88007f3cd4f0 RBP: ffff88007f3cd4f0 R08: 0000000000000001 R09: ffff88000114c000 R10: ffffffff8029b92c R11: ffff880000fbfbb0 R12: ffff88007f3cd4f0 R13: ffff880000fbfdd8 R14: 0000000000000000 R15: ffff88007f31b870 FS: 0000000000000000(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 The /dev/etherd/e4.1 backend to the xvdd device is present in the dom0 and works perfectly. I can access it from within the dom0 with no problem. Something is confused. I would really like to avoid rebooting the dom0''s if at all possible. I have found that if I remove the "phy:/dev/etherd/e4.1,xvdd,w" from the disk = line the domU boots fine. But if I try to block-attach the missing device the domU dies instantly. I have been looking for logs that might explain something about why it died but I cannot find anything relevant. I have googled the "don''t try to register thigns with the same name in the same directory" error and found a few references to it but none in the context of xen. Any advice would be greatly appreciated. -- Tracy Reed http://tracyreed.org _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Jan 21, 2010 at 10:21:26PM -0800, Tracy Reed wrote:> I have run into a strange situation where a domain will not boot with > a certain disk specified in the config file and trying to block-attach > it after it starts results in the domain disappearing from the list > and presumably simply crashing. > > I am running CentOS 5.4 with kernel 2.6.18-160.el5xen x86_64 > > For months everything worked perfectly with with these domains using > an AoE SAN for the back-end. I have used this sort of setup for > several years and it is great. But these domains in particular have > been running for several months. Then 3 of the 4 domU''s I run were > really heavily slammed and became unresponsive and I ended up having > to do an xm destroy on them. After that they refuse to come back > up. One of my domU''s has not been rebooted and it continues to work > great with all 4 disk devices attached. > > Here is my domU config file: > > name = "db2" > uuid = "f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff" > maxmem = 16384 > memory = 2048 > vcpus = 4 > bootloader = "/usr/bin/pygrub" > on_poweroff = "destroy" > on_reboot = "restart" > on_crash = "restart" > vfb = [ ] > disk = [ "phy:/dev/etherd/e1.12,xvda,w", "phy:/dev/etherd/e2.12,xvdb,w", "phy:/dev/etherd/e3.1,xvdc,w", "phy:/dev/etherd/e4.1,xvdd,w" ] > vif = [ "mac=00:16:3e:5b:5c:dd,bridge=dmz" ] > > If I boot the domU with this config file I get the following on boot: > > Red Hat nash version 5.1.19.6 starting > Mounting proc filesystem > Mounting sysfs filesystem > Creating /dev > Creating initial device nodes > Setting up hotplug. > Creating block device nodes. > Loading ehci-hcd.ko module > Loading ohci-hcd.ko module > Loading uhci-hcd.ko module > USB Universal Host Controller Interface driver v3.0 > Loading jbd.ko module > Loading ext3.ko module > Loading raid1.ko module > md: raid1 personality registered for level 1 > Loading xenblk.ko module > Registering block device major 202 > xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 > > xvdb: xvdb1 xvdb2 xvdb3 xvdb4 < xvdb5 > > xvdc: xvdc1 > kobject_add failed for xvda with -EEXIST, don''t try to register things with the same name in the same directory. > > Call Trace: > [<ffffffff803404ea>] kobject_add+0x170/0x19b > [<ffffffff8025cfd5>] exact_lock+0x0/0x14 > [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 > [<ffffffff802fb4e2>] register_disk+0x43/0x190 > [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 > [<ffffffff80336c3a>] add_disk+0x34/0x3d > [<ffffffff88084ec9>] :xenblk:backend_changed+0x110/0x193 > [<ffffffff803b32fa>] xenbus_read_driver_state+0x26/0x3b > [<ffffffff803b4bdb>] xenwatch_thread+0x0/0x135 > [<ffffffff803b402d>] xenwatch_handle_callback+0x15/0x48 > [<ffffffff803b4cf7>] xenwatch_thread+0x11c/0x135 > [<ffffffff8029bb44>] autoremove_wake_function+0x0/0x2e > [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 > [<ffffffff80233bcd>] kthread+0xfe/0x132 > [<ffffffff80260b2c>] child_rip+0xa/0x12 > [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 > [<ffffffff80233acf>] kthread+0x0/0x132 > [<ffffffff80260b22>] child_rip+0x0/0x12 > > Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: > [<ffffffff802fe512>] create_dir+0x11/0x1cf > PGD 7f1c9067 PUD 7f1ca067 PMD 0 > Oops: 0000 [1] SMP > last sysfs file: /block/ram0/dev > CPU 1 > Modules linked in: xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd > Pid: 9, comm: xenwatch Not tainted 2.6.18-164.el5xen #1 > RIP: e030:[<ffffffff802fe512>] [<ffffffff802fe512>] create_dir+0x11/0x1cf > RSP: e02b:ffff880000fbfda0 EFLAGS: 00010282 > RAX: ffff88007f31b870 RBX: ffff88007f3cd4f0 RCX: ffff880000fbfdd8 > RDX: ffff88007f3cd4f8 RSI: 0000000000000000 RDI: ffff88007f3cd4f0 > RBP: ffff88007f3cd4f0 R08: 0000000000000001 R09: ffff88000114c000 > R10: ffffffff8029b92c R11: ffff880000fbfbb0 R12: ffff88007f3cd4f0 > R13: ffff880000fbfdd8 R14: 0000000000000000 R15: ffff88007f31b870 > FS: 0000000000000000(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > > The /dev/etherd/e4.1 backend to the xvdd device is present in the dom0 > and works perfectly. I can access it from within the dom0 with no > problem. > > Something is confused. I would really like to avoid rebooting the > dom0''s if at all possible. > > I have found that if I remove the "phy:/dev/etherd/e4.1,xvdd,w" from > the disk = line the domU boots fine. But if I try to block-attach the > missing device the domU dies instantly. > > I have been looking for logs that might explain something about why it > died but I cannot find anything relevant. I have googled the "don''t > try to register thigns with the same name in the same directory" error > and found a few references to it but none in the context of xen. > > Any advice would be greatly appreciated. >Does it work if you attach some local LVM volume or file image (non-AOE) as xvdd? Do you get errors in dom0 "dmesg"? How about dom0 /var/log/messages? Do you get errors in dom0 "xm log" ? How about "xm dmesg"? -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jan 22, 2010 at 10:00:18AM +0200, Pasi Kärkkäinen spake thusly:> Does it work if you attach some local LVM volume or file image (non-AOE) as xvdd? > > Do you get errors in dom0 "dmesg"? How about dom0 /var/log/messages? > Do you get errors in dom0 "xm log" ? How about "xm dmesg"?Nothing in dom0 dmesg or /var/log/messages xm dmesg contains many things like: (XEN) mm.c:625:d172 Non-privileged (172) attempt to map I/O space 000000f0 (XEN) mm.c:625:d53 Non-privileged (53) attempt to map I/O space 000000f0 (XEN) mm.c:625:d72 Non-privileged (72) attempt to map I/O space 000000f0 (XEN) mm.c:625:d90 Non-privileged (90) attempt to map I/O space 000000f0 (XEN) mm.c:625:d172 Non-privileged (172) attempt to map I/O space 000000f0 Not sure if this indicates a problem or is related to my current problem or not. Hard to tell since it isn''t time-stamped. In xm log I see things like: [2010-01-21 21:51:30 xend 8993] INFO (image:137) buildDomain os=linux dom=178 vc pus=4 [2010-01-21 21:51:30 xend 8993] DEBUG (image:206) domid = 178 [2010-01-21 21:51:30 xend 8993] DEBUG (image:207) memsize 2048 [2010-01-21 21:51:30 xend 8993] DEBUG (image:208) image /var/lib/xen/ boot_kernel.CkgrHX [2010-01-21 21:51:30 xend 8993] DEBUG (image:209) store_evtchn = 1 [2010-01-21 21:51:30 xend 8993] DEBUG (image:210) console_evtchn = 2 [2010-01-21 21:51:30 xend 8993] DEBUG (image:211) cmdline = ro root=/dev /md0 console=xvc0 [2010-01-21 21:51:30 xend 8993] DEBUG (image:212) ramdisk /var/lib/xen/ boot_ramdisk.UfCwIK [2010-01-21 21:51:30 xend 8993] DEBUG (image:213) vcpus = 4 [2010-01-21 21:51:30 xend 8993] DEBUG (image:214) features = [2010-01-21 21:51:30 xend 8993] DEBUG (blkif:27) exception looking up device num ber for xvda: [Errno 2] No such file or directory: \047/dev/xvda\047 [2010-01-21 21:51:30 xend 8993] DEBUG (DevController:110) DevController: writing {\047virtual-device\047: \04751712\047, \047device-type\047: \047disk\047, \047 protocol\047: \047x86_64-abi\047, \047backend-id\047: \0470\047, \047state\047: \0471\047, \047backend\047: \047/local/domain/0/backend/vbd/178/51712\047} to /l ocal/domain/178/device/vbd/51712. [2010-01-21 21:51:30 xend 8993] DEBUG (DevController:112) DevController: writing {\047domain\047: \047db2\047, \047frontend\047: \047/local/domain/178/device/vb d/51712\047, \047format\047: \047raw\047, \047dev\047: \047xvda\047, \047state\0 47: \0471\047, \047params\047: \047/dev/etherd/e1.12\047, \047mode\047: \047w\04 7, \047online\047: \0471\047, \047frontend-id\047: \047178\047, \047type\047: \0 47phy\047} to /local/domain/0/backend/vbd/178/51712. [2010-01-21 21:51:30 xend 8993] DEBUG (blkif:27) exception looking up device num ber for xvdb: [Errno 2] No such file or directory: \047/dev/xvdb\047 [2010-01-21 21:51:30 xend 8993] DEBUG (DevController:110) DevController: writing {\047virtual-device\047: \04751728\047, \047device-type\047: \047disk\047, \047 protocol\047: \047x86_64-abi\047, \047backend-id\047: \0470\047, \047state\047: \0471\047, \047backend\047: \047/local/domain/0/backend/vbd/178/51728\047} to /l ocal/domain/178/device/vbd/51728. [2010-01-21 21:51:30 xend 8993] DEBUG (DevController:112) DevController: writing {\047domain\047: \047db2\047, \047frontend\047: \047/local/domain/178/device/vb d/51728\047, \047format\047: \047raw\047, \047dev\047: \047xvdb\047, \047state\0 47: \0471\047, \047params\047: \047/dev/etherd/e2.12\047, \047mode\047: \047w\04 7, \047online\047: \0471\047, \047frontend-id\047: \047178\047, \047type\047: \0 47phy\047} to /local/domain/0/backend/vbd/178/51728. [2010-01-21 21:51:30 xend 8993] DEBUG (blkif:27) exception looking up device num ber for xvdc: [Errno 2] No such file or directory: \047/dev/xvdc\047 and also: [2010-01-21 21:51:06 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vif/177/0/hotplug-status. [2010-01-21 21:51:06 xend 8993] DEBUG (DevController:510) hotplugStatusCallback 1. [2010-01-21 21:51:06 xend 8993] DEBUG (DevController:154) Waiting for devices us b. [2010-01-21 21:51:06 xend 8993] DEBUG (DevController:154) Waiting for devices vb d. [2010-01-21 21:51:06 xend 8993] DEBUG (DevController:160) Waiting for 51712. [2010-01-21 21:51:06 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/51712/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/51712/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:510) hotplugStatusCallback 1. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:160) Waiting for 51728. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/51728/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/51728/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:510) hotplugStatusCallback 1. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:160) Waiting for 51744. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/51744/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/51744/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:510) hotplugStatusCallback 1. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:160) Waiting for 0. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:496) hotplugStatusCallback /local/domain/0/backend/vbd/177/0/hotplug-status. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:510) hotplugStatusCallback 1. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices ir q. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices vk bd. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices vf b. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices pc i. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices io ports. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices ta p. [2010-01-21 21:51:07 xend 8993] DEBUG (DevController:154) Waiting for devices vt pm. [2010-01-21 21:51:12 xend.XendDomainInfo 8993] WARNING (XendDomainInfo:965) Doma in has crashed: name=db2 id=177. [2010-01-21 21:51:12 xend.XendDomainInfo 8993] DEBUG (XendDomainInfo:832) Storin g domain details: {\047console/ring-ref\047: \0475339667\047, \047console/port\0 47: \0472\047, \047cpu/3/availability\047: \047online\047, \047name\047: \047db2 \047, \047console/limit\047: \0471048576\047, \047cpu/2/availability\047: \047on line\047, \047vm\047: \047/vm/f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff\047, \047domi d\047: \047177\047, \047cpu/0/availability\047: \047online\047, \047memory/targe t\047: \0472097152\047, \047store/ring-ref\047: \0475973074\047, \047cpu/1/avail ability\047: \047online\047, \047store/port\047: \0471\047} [2010-01-21 21:51:12 xend.XendDomainInfo 8993] ERROR (XendDomainInfo:1896) VM db 2 restarting too fast (8.055686 seconds since the last restart). Refusing to re start to avoid loops. I''m pretty sure these are related to the crashes I saw. -- Tracy Reed http://tracyreed.org _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users