2010/7/6 Csillag Kristof <csillag.kristof@gmail.com>
> Hi all,
>
> I had a kernel crash on a XEN domU right now, on a server running since
> 4 days.
>
> (Current uprecord is 248 days, so this is not excepted, but then again,
> that was before
> I have upgraded from (XEN 3.2 / kernel 2.6.26) to (XEN 4.0 / kernel
> 2.6.32))
>
> * * *
>
> I run Xen hypervisor version 4.0.0 (Debian 4.0.0-2), and
> linux kernel 2.6.32-5-xen-amd64 (debian: 2.6.32-15) on both the Dom0 and
> the (PV) DomU.
>
> (The DomU is running a XEN kernel because I have a PCI NIC passed to it,
> and current debian 2.4.32 pv_ops kernel does not contain the required
> pcifront driver.)
>
> Here is what the DomU kernel has said, copied from the output of "xm
> console":
>
> ------------------
>
> [403163.914167] ------------[ cut here ]------------
> [403163.914186] kernel BUG at
>
>
/build/buildd-linux-2.6_2.6.32-15-i386-fb7Hfg/linux-2.6-2.6.32/debian/build/source_i386_xen/mm/slub.c:2969!
> [403163.914205] invalid opcode: 0000 [#1] SMP
> [403163.914222] last sysfs file: /sys/devices/virtual/net/ppp0/uevent
> [403163.914236] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp
> ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc
> nf_conntrack_ftp xt_state xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox
> ppp_generic slhc sundance iptable_nat nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables
> x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev
> snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore snd_page_alloc
> ext3 jbd mbcache thermal_sys xen_blkfront mii [last unloaded: sundance]
> [403163.914455]
> [403163.914465] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1)
> [403163.914478] EIP: 0061:[<c10b73ec>] EFLAGS: 00010246 CPU: 0
> [403163.914492] EIP is at kfree+0x69/0xde
> [403163.914502] EAX: 40000000 EBX: c1c56a80 ECX: c145942c EDX: c1575c40
> [403163.914514] ESI: c2262000 EDI: c11f09eb EBP: c138f9c8 ESP: c1381ed4
> [403163.914527] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [403163.914539] Process swapper (pid: 0, ti=c1380000 task=c13c0ba0
> task.ti=c1380000)
> [403163.914552] Stack:
> [403163.914560] c1575c40 c8829f1d c13bf520 c13bf520 c1c56a80 c163a654
> 00000000 c138f9c8
> [403163.914597] <0> c11f09eb 00000000 c11f5825 c13bf520 c1380000
> 00000002 00000008 c138f9c8
> [403163.914636] <0> c103d004 c1457408 00000001 0000000a 00000000
> 00000100 c1380000 00000000
> [403163.914680] Call Trace:
> [403163.914700] [<c8829f1d>] ? xennet_interrupt+0x4d/0x57
[xen_netfront]
> [403163.914717] [<c11f09eb>] ? __kfree_skb+0xf/0x6e
> [403163.914732] [<c11f5825>] ? net_tx_action+0x58/0xf9
> [403163.914748] [<c103d004>] ? __do_softirq+0xaa/0x151
> [403163.914762] [<c103d0dc>] ? do_softirq+0x31/0x3c
> [403163.914776] [<c103d1b2>] ? irq_exit+0x26/0x58
> [403163.914791] [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
> [403163.914816] [<c1009b7f>] ? xen_do_upcall+0x7/0xc
> [403163.914832] [<c10023a7>] ? hypercall_page+0x3a7/0x1001
> [403163.914847] [<c1006169>] ? xen_safe_halt+0xf/0x1b
> [403163.914861] [<c10042bf>] ? xen_idle+0x23/0x30
> [403163.914875] [<c1008168>] ? cpu_idle+0x89/0xa5
> [403163.914890] [<c13f980d>] ? start_kernel+0x318/0x31d
> [403163.914905] [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
> [403163.914920] [<c1409045>] ? efi_init+0xb4/0x580
> [403163.914930] Code: 86 00 00 00 40 c1 e8 0c c1 e0 05 01 d0 89 04 24 66
> 83 38 00 79 06 8b 40 0c 89 04 24 8b 14 24 8b 02 84 c0 78 19 66 a9 00 c0
> 75 04 <0f> 0b eb fe 8b 04 24 83 c4 10 5b 5e 5f 5d e9 c7 e9 fd ff 8b
04
> [403163.915175] EIP: [<c10b73ec>] kfree+0x69/0xde SS:ESP
0069:c1381ed4
> [403163.915201] ---[ end trace c5944bb691c7520c ]---
> [403163.915212] Kernel panic - not syncing: Fatal exception in interrupt
> [403163.915225] Pid: 0, comm: swapper Tainted: G D
> 2.6.32-5-xen-686 #1
> [403163.915237] Call Trace:
> [403163.915247] [<c128c4e1>] ? panic+0x38/0xe4
> [403163.915261] [<c100bf56>] ? oops_end+0x91/0x9d
> [403163.915275] [<c100a0d3>] ? do_invalid_op+0x0/0x75
> [403163.915288] [<c100a13f>] ? do_invalid_op+0x6c/0x75
> [403163.915301] [<c10b73ec>] ? kfree+0x69/0xde
> [403163.915315] [<c12075a5>] ? sch_direct_xmit+0x69/0x10c
> [403163.915329] [<c11f8095>] ? dev_queue_xmit+0x260/0x38e
> [403163.915343] [<c103d1fa>] ? _local_bh_enable_ip+0x16/0x6e
> [403163.915357] [<c11f8191>] ? dev_queue_xmit+0x35c/0x38e
> [403163.915371] [<c1021d2e>] ? pvclock_clocksource_read+0x48/0xa7
> [403163.915387] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
> [403163.915401] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
> [403163.915416] [<c128e1d3>] ? error_code+0x73/0x78
> [403163.915429] [<c11f09eb>] ? __kfree_skb+0xf/0x6e
> [403163.915442] [<c10b73ec>] ? kfree+0x69/0xde
> [403163.915461] [<c8829f1d>] ? xennet_interrupt+0x4d/0x57
[xen_netfront]
> [403163.915476] [<c11f09eb>] ? __kfree_skb+0xf/0x6e
> [403163.915489] [<c11f5825>] ? net_tx_action+0x58/0xf9
> [403163.915503] [<c103d004>] ? __do_softirq+0xaa/0x151
> [403163.915517] [<c103d0dc>] ? do_softirq+0x31/0x3c
> [403163.915530] [<c103d1b2>] ? irq_exit+0x26/0x58
> [403163.915543] [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
> [403163.915556] [<c1009b7f>] ? xen_do_upcall+0x7/0xc
> [403163.915570] [<c10023a7>] ? hypercall_page+0x3a7/0x1001
> [403163.915584] [<c1006169>] ? xen_safe_halt+0xf/0x1b
> [403163.915597] [<c10042bf>] ? xen_idle+0x23/0x30
> [403163.915609] [<c1008168>] ? cpu_idle+0x89/0xa5
> [403163.915623] [<c13f980d>] ? start_kernel+0x318/0x31d
> [403163.915637] [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
> [403163.915650] [<c1409045>] ? efi_init+0xb4/0x580
>
> ------------------
>
> Meanwhile, the Dom0 kernel has said this:
>
> -----------------
> [407187.550176] irq 17: nobody cared (try booting with the
"irqpoll"
> option)
> [407187.550217] Pid: 1940, comm: xend Tainted: G W
> 2.6.32-5-xen-amd64 #1
> [407187.550253] Call Trace:
> [407187.550279] <IRQ> [<ffffffff810972dd>] ?
__report_bad_irq+0x30/0x7d
> [407187.550324] [<ffffffff8109742f>] ? note_interrupt+0x105/0x16e
> [407187.550359] [<ffffffff81097b36>] ? handle_level_irq+0x80/0xc3
> [407187.550394] [<ffffffff811f1a58>] ?
__xen_evtchn_do_upcall+0xe1/0x167
> [407187.550430] [<ffffffff811f22e5>] ?
xen_evtchn_do_upcall+0x2e/0x42
> [407187.550430] [<ffffffff81012cfe>] ?
> xen_do_hypervisor_callback+0x1e/0x30
> [407187.550430] <EOI>
> [407187.550430] handlers:
> [407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192
[ata_piix])
> [407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192
[ata_piix])
> [407187.550430] Disabling IRQ #17
>
> -----------------
>
> The mentioned IRQ #17 belongs the the passed-through PCI nic.
> (I am not using IOMMU, since my MB does not support it.)
>
> I have rebooted the Dom0 (using xm reset), but the passed through NIC
> never worked again,
> so eventually I had to reboot the whole physical machine.
>
> * * *
>
> Any idea what could cause this?
>
> Thank you for your help:
>
> Kristof Csillag
>
>
At the risk of just claiming "me too", I would like to second this
report of
many such errors:
irq 124: nobody cared (try booting with the "irqpoll" option)
It appears that any card that generates a high level of MSI interrupts
causes this message. In our case it''s a tachyon FC card.
It seems specific to pv-ops kernels as we did not have this problem with the
hvm kernel.
-Bruce
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users