Gregor Pirnaver
2006-Sep-08 22:35 UTC
[Fedora-xen] Recurring Zombie (XenU used as a web server)
For some weeks I have a problem with XenU domain used as web server. It is turning into Zombie domain: # xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 641 4 r----- 263.6 Zombie-wwwmain 2 512 1 ----cd 5238.4 dns 4 128 1 -b---- 20.1 intranet 6 512 1 -b---- 18.7 ldap 5 128 1 -b---- 61.4 mail 3 512 1 r----- 1078.0 www 1 512 1 -b---- 117.7 wwwextra 7 512 1 -b---- 33.2 (Both Xen0 and XenU are running FC5 with kernel 2.6.17-1.2174.) - How can I find out what is causing this problem? - How can I fix it / work around it / report a bug? This is production machine so I would appriciate any pointers. I didn''t have this problem before. Maybe new kernel is causing the problem? Is there a way to get older versions of kernel packages? I would also like to note that when this happens networking doesn''t work in ANY XenU domain. E.g.: # xm console dns after login trying to ping Google IP: # ping 72.14.221.104 PING 72.14.221.104 (72.14.221.104) 56(84) bytes of data. ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ... and when I run "init 0" in XenU (e.g. in dns XenU): # init 0 it hangs at this step: "Removing module iptables: " When I try to shutdown XenU from Xen0 it also turns to Zombie. E.g. (dns XenU domain turns into Zombie-dns): # xm shutdown dns What can I do? ____________________ http://www.email.si/
Rik van Riel
2006-Sep-09 17:08 UTC
Re: [Fedora-xen] Recurring Zombie (XenU used as a web server)
Gregor Pirnaver wrote:> For some weeks I have a problem with XenU domain used as web server. It is > turning into Zombie domain: > > # xm list > Name ID Mem(MiB) VCPUs State Time(s) > Domain-0 0 641 4 r----- 263.6 > Zombie-wwwmain 2 512 1 ----cd 5238.4Is that a fully virtualized domain, or paravirt ?> (Both Xen0 and XenU are running FC5 with kernel 2.6.17-1.2174.) > > - How can I find out what is causing this problem? > - How can I fix it / work around it / report a bug?If it is paravirt, you can "xm sysrq <domain> <key>" to get some debugging output. If the domain in question is fully virt, do you have a stale qemu-dm hanging around? -- What is important? What you want to be true, or what is true?
Russell McOrmond
2006-Sep-21 14:49 UTC
Re: [Fedora-xen] Recurring Zombie (XenU used as a web server)
Gregor Pirnaver wrote:> For some weeks I have a problem with XenU domain used as web server. It is > turning into Zombie domain:I observed the same problem and filed a bug report about it which is active: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944 Add yourself to the CC on that bug... -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!"
Kwan Lowe
2006-Sep-21 15:04 UTC
Re: [Fedora-xen] Recurring Zombie (XenU used as a web server)
> Gregor Pirnaver wrote: >> For some weeks I have a problem with XenU domain used as web server. It is >> turning into Zombie domain: > > I observed the same problem and filed a bug report about it which is > active: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944 > > Add yourself to the CC on that bug... >This is the bug report I''d found... It looks like yours is the same issue: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=204468 -- * The Digital Hermit http://www.digitalhermit.com * Unix and Linux Solutions kwan@digitalhermit.com
Russell McOrmond
2006-Sep-21 15:26 UTC
Re: [Fedora-xen] Recurring Zombie (XenU used as a web server)
>> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944Kwan Lowe wrote:> This is the bug report I''d found... It looks like yours is the same issue: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=204468I''ve added a note there. If people can leave their consoles open and see if there is a kernel panic similar to the one that I reported, it may be the same xennet related bug. I''ve observed the same problem on a number of machines I administer (Mixed Intel, AMD, etc), and each time it seems to come down to the same xennet issue. -- Russell McOrmond, Internet Consultant: <http://www.flora.ca/> Please help us tell the Canadian Parliament to protect our property rights as owners of Information Technology. Sign the petition! http://www.digital-copyright.ca/petition/ict/ "The government, lobbied by legacy copyright holders and hardware manufacturers, can pry my camcorder, computer, home theatre, or portable media player from my cold dead hands!"
Adrian Chadd
2006-Sep-28 04:21 UTC
Re: [Fedora-xen] Recurring Zombie (XenU used as a web server)
On Thu, Sep 21, 2006, Russell McOrmond wrote:> >> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199944 > > Kwan Lowe wrote: > >This is the bug report I''d found... It looks like yours is the same issue: > > > >https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=204468 > > I''ve added a note there. If people can leave their consoles open and > see if there is a kernel panic similar to the one that I reported, it > may be the same xennet related bug. > > I''ve observed the same problem on a number of machines I administer > (Mixed Intel, AMD, etc), and each time it seems to come down to the same > xennet issue.Here''s a recent crash. I can''t run 2187 because it crashes almost straight away (and I think I emailed the crashdump to the list already.) Its still worrying that this oops takes out networking for all VMs.. Adrian BUG: unable to handle kernel NULL pointer dereference at virtual address 000000bc printing eip: c908e1ad *pde = ma 1509f067 pa 01ff3067 *pte = ma 00000000 pa fffff000 Oops: 0002 [#1] SMP Modules linked in: ipv6 xennet dm_snapshot dm_zero dm_mirror dm_mod raid1 CPU: 0 EIP: 0061:[<c908e1ad>] Not tainted VLI EFLAGS: 00210046 (2.6.17-1.2157_FC5xenU #1) EIP is at network_tx_buf_gc+0xc4/0x1b7 [xennet] eax: 00000066 ebx: 00000032 ecx: c6540cfc edx: 00000000 esi: 00000001 edi: c6540400 ebp: 0000002c esp: c0651edc ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, threadinfo=c0650000 task=c05f1800) Stack: <0>c6540cfc 00000000 00000000 00000004 c6540000 001ece31 001ece32 001ece12 00000000 c6540488 c6540400 c6540000 c908f150 c01b5e80 00000000 00000000 00000107 c043a57d 00000107 c6540000 c0651f88 c0651f88 00000107 c0643780 Call Trace: <c908f150> netif_int+0x24/0x66 [xennet] <c043a57d> handle_IRQ_event+0x42/0x85 <c043a64d> __do_IRQ+0x8d/0xdc <c040665a> do_IRQ+0x1a/0x25 <c0519efd> evtchn_do_upcall+0x66/0x9f <c0404d79> hypervisor_callback+0x3d/0x48 <c0407a6a> safe_halt+0x84/0xa7 <c0402bde> xen_idle+0x46/0x4e <c0402cfd> cpu_idle+0x94/0xad <c0655772> start_kernel+0x346/0x34c Code: b4 9f 00 09 00 00 50 e8 9d c5 48 f7 c7 84 9f 00 09 00 00 00 00 00 00 8b 87 f4 00 00 00 89 84 9f f4 00 00 00 89 9f f4 00 00 00 90 <ff> 8d 90 00 00 00 0f 94 c0 83 c4 10 84 c0 74 62 bb 00 e0 ff ff EIP: [<c908e1ad>] network_tx_buf_gc+0xc4/0x1b7 [xennet] SS:ESP 0069:c0651edc <0>Kernel panic - not syncing: Fatal exception in interrupt