Russell McOrmond
2006-Jul-24 15:24 UTC
[Xen-users] Debugging a XenU that goes to Zombie state
Once last night and the night before a XenU go into a Zombie state, requiring a reboot. I''m not quite sure what is happening, and am looking for advise on how to diagnose the problem. .. While writing this email, it crashed again. This time I had an ''xm console calcutta'' capturing the output. I read a suggestion in these archives suggesting that I can just restart xend to get things working again, but trying that gives me: Going to boot Fedora Core (2.6.17-1.2145_FC5xenU) kernel: /boot/vmlinuz-2.6.17-1.2145_FC5xenU initrd: /boot/initrd-2.6.17-1.2145_FC5xenU.img Error: Device 0 (vif) could not be connected. Hotplug scripts not working. So I had to reboot everything. Here is what I captured from the ''xm console'': BUG: unable to handle kernel NULL pointer dereference at virtual address 0000009a ^M printing eip: ^Me10fd1ad ^M*pde = ma 08f98067 pa 17077067 ^M*pte = ma 00000000 pa fffff000 ^MOops: 0002 [#1] ^MSMP ^MModules linked in: ipv6 xennet ipt_REJECT xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat ip_nat ip_conntrack nfnetlink ip_tables x_tables dm_mirror dm_mod ^MCPU: 0 ^MEIP: 0061:[<e10fd1ad>] Not tainted VLI ^MEFLAGS: 00010046 (2.6.17-1.2157_FC5xenU #1) ^MEIP is at network_tx_buf_gc+0xc4/0x1b7 [xennet] ^Meax: 00000011 ebx: 0000000c ecx: d9fc8cfc edx: 00000000 ^Mesi: 00000001 edi: d9fc8400 ebp: 0000000a esp: c0651d90 ^Mds: 007b es: 007b ss: 0069 ^MProcess swapper (pid: 0, threadinfo=c0650000 task=c05f1800) ^MStack: <0>d9fc8cfc 00000000 00000000 00000004 d9fc8000 0000f002 0000f003 0000effc ^M 00000000 d9fc8488 d9fc8400 d9fc8000 e10fe150 dba603e0 00000000 00000000 ^M 00000108 c043a57d 00000108 d9fc8000 c0651e3c c0651e3c 00000108 c0643800 ^MCall Trace: ^M <e10fe150> netif_int+0x24/0x66 [xennet] <c043a57d> handle_IRQ_event+0x42/0x85 ^M <c043a64d> __do_IRQ+0x8d/0xdc <c040665a> do_IRQ+0x1a/0x25 ^M <c0519efd> evtchn_do_upcall+0x66/0x9f <c0404d79> hypervisor_callback+0x3d/0x48 ^M <e10fd9ca> network_alloc_rx_buffers+0x2c3/0x30b [xennet] <e10fe9ac> netif_poll+0x639/0x784 [xennet] ^M <c055a3c5> net_rx_action+0xcd/0x1fe <c041d5bb> __do_softirq+0x70/0xef ^M <c041d67a> do_softirq+0x40/0x67 <c040665f> do_IRQ+0x1f/0x25 ^M <c0519efd> evtchn_do_upcall+0x66/0x9f <c0404d79> hypervisor_callback+0x3d/0x48 ^M <c0407a6a> safe_halt+0x84/0xa7 <c0402bde> xen_idle+0x46/0x4e ^M <c0402cfd> cpu_idle+0x94/0xad <c0655772> start_kernel+0x346/0x34c ^MCode: b4 9f 00 09 00 00 50 e8 9d d5 41 df c7 84 9f 00 09 00 00 00 00 00 00 8b 87 f4 00 00 00 89 84 9f f4 00 00 00 89 9f f4 00 00 00 90 <ff> 8d 90 00 00 00 0f 94 c0 83 c4 10 84 c0 74 62 bb 00 e0 ff ff ^MEIP: [<e10fd1ad>] network_tx_buf_gc+0xc4/0x1b7 [xennet] SS:ESP 0069:c0651d90 ^M <0>Kernel panic - not syncing: Fatal exception in interrupt ^M ESC_root@westbengal:~ESC\[root@westbengal ~]# ---cut--- Notes from before most recent crash to give machine context. For various reasons this is the only XenU currently running on the machine, so I don''t currently know if other Xen''s would have died if they were on the same machine. I''m running Fedora Core 5 on both the Xen0 and the XenU, on a dual-core Athlon box with 2G RAM. powernow-k8: Found 2 AMD Athlon 64 / Opteron processors (version 1.60.2) I created this XenU a few weeks ago by tar''ing up a server that wasn''t running Xen, and decompressing it on some LVM partitions. It ran fine for a while, and I expect that it is something that I upgraded (YUM) that went bad, but wanted to ask if anyone else has seen anything unusual before trying to back out of all recent changes to find out what happened. Very boring config for XenU: # FLORA.org server name = "calcutta" memory = "512" # was ''phy:hdb,hda,w'', disk = [ ''phy:mapper/XenImages-CalcuttaSlash,hda1,w'', ''phy:mapper/XenImages-CalcuttaHome,hda2,w'', ''phy:mapper/XenSwap-CalcuttaSWP,hda3,w'' ] vif = [ ''mac=00:16:3e:5c:76:da'' ] bootloader="/usr/bin/pygrub" on_reboot = ''restart'' on_crash = ''restart'' I don''t know what all RedHat has patched to 2.6.17, but the following are the relevant RedHat versions: 2.6.17-1.2145_FC5xenU and 2.6.17-1.2157_FC5xenU . I don''t think the problem is with the kernel. July 11 is when I switched to Xen and it worked until yesterday morning when I had to force a reboot. I upgraded various packages (to latest versions via Yum) on July 11, 17, 18, and after the first crash on July 23. (Yum nicely outputs what packages is updates/installs into the logs). The following are the last 3 restarts (reading from /var/log/messages* , which agrees with my memory of things.) Jul 17 16:55:10 calcutta kernel: Linux version 2.6.17-1.2145_FC5xenU (brewbuilder@hs20-bc2-2.build.redhat.com) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Sat Jul 1 13:54:07 EDT 2006 Note: YUM updates for July 17 and 18''th were after this reboot, which is why I''m fairly sure the problem isn''t with the kernel. Then a reboot after a Zombie: Jul 23 09:07:55 calcutta kernel: Linux version 2.6.17-1.2157_FC5xenU (brewbuilder@ls20-bc2-14.build.redhat.com) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Wed Jul 12 00:46:43 EDT 2006 I then did a yum update ''just in case'' something had been fixed. And another reboot this morning: Jul 24 09:28:07 calcutta kernel: Linux version 2.6.17-1.2157_FC5xenU (brewbuilder@ls20-bc2-14.build.redhat.com) (gcc version 4.1.1 20060525 (Red Hat 4.1.1-1)) #1 SMP Wed Jul 12 00:46:43 EDT 2006 My top suspects as things which might touch the system in a way that could cause things to crash the kernel: Jul 17 18:05:17 calcutta yum: Updated: glibc-common.i386 2.4-8 Jul 17 18:05:25 calcutta yum: Updated: glibc.i386 2.4-8 Jul 17 18:05:26 calcutta yum: Updated: glibc-headers.i386 2.4-8 Jul 17 18:05:26 calcutta yum: Updated: glibc-devel.i386 2.4-8 Jul 17 18:05:27 calcutta yum: Updated: glibc-utils.i386 2.4-8 Jul 17 18:14:04 calcutta yum: Updated: procps.i386 3.2.6-3.5 Jul 17 18:14:05 calcutta yum: Updated: psmisc.i386 22.2-1.1 Jul 18 11:26:24 calcutta yum: Updated: libsepol.i386 1.12.17-1.fc5 Jul 18 11:26:24 calcutta yum: Updated: libselinux.i386 1.30.3-4.fc5 Jul 18 11:26:24 calcutta yum: Updated: libselinux-python.i386 1.30.3-4.fc5 -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Possibly Parallel Threads
- How to get ringing sound in outbound call in asterisk
- file.c:1160 ast_writefile: Unable to open file /var/spool/asterisk/monitor/11Feb2014/_11-Feb-2014-17-44-01.wav: No such file or directory
- .Rd Files
- several ext3 and mysql kernel crashes
- Problems regarding the package "BRugs"