Csillag Kristof
2010-Jul-10 22:35 UTC
[Xen-users] repeated kernel crashes with PCI passthru
Hi all, I have recently upgraded one of my Debian servers from XEN 3.2 / Kernel 2.6.26 to XEN 4.0 / Kernel 2.6.32. I have configured PCI passthru for a NIC Since the current Debian pvops kernel does not have the xen pci frontend driver required for PCI passthru, I am running a XEN kernel in both dom0 and domU, so actual kernel versions are: dom0: 2.6.32-5-xen-amd64 #1 SMP Tue Jun 1 domU: 2.6.32-5-xen-686 #1 SMP Tue Jul 6 the hypervisor is 4.0.1-rc3 (Random notes: 1. the dom0 is 64bit, this domU is 32bit. 2. The dom0 kernel is not the latest (-16), but the one before (-15), because the current one won''t boot up, see #588509 and #588426. ) * * * So, the system boots up as it should, but sometimes the domU crashes, with messages like these: --------------------- [27047.101954] BUG: unable to handle kernel paging request at 00d90200 [27047.101979] IP: [<c11f01aa>] skb_release_data+0x71/0x90 [27047.102000] *pdpt = 0000000001c21027 *pde = 0000000000000000 [27047.102019] Thread overran stack, or stack corrupted [27047.102031] Oops: 0000 [#1] SMP [27047.102047] last sysfs file: /sys/devices/virtual/net/ppp0/uevent [27047.102060] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp xt_state xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox ppp_generic slhc sundance mii iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore snd_page_alloc ext3 jbd mbcache thermal_sys xen_blkfront [27047.102275] [27047.102285] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1) [27047.102298] EIP: 0061:[<c11f01aa>] EFLAGS: 00010206 CPU: 0 [27047.102310] EIP is at skb_release_data+0x71/0x90 [27047.102321] EAX: 00d90200 EBX: 00000000 ECX: c2939c10 EDX: cec6b500 [27047.102333] ESI: cf8f0a80 EDI: cf8f09c0 EBP: c13919c8 ESP: c1383eec [27047.102346] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [27047.102358] Process swapper (pid: 0, ti=c1382000 task=c13c2ba0 task.ti=c13820 [27047.102371] Stack: [27047.102379] cf8f0a80 c293a700 c11efdfb cf8f09c0 c11f4c35 00000011 c1380000 00000002 [27047.102415] <0> 00000008 c13919c8 c103c1ec c14594b0 00000001 0000000a 00000000 00000100 [27047.102455] <0> c1380000 00000000 c13c5d18 00000000 c103c2c4 00000000 c1383f5c c103c39a [27047.102499] Call Trace: [27047.102512] [<c11efdfb>] ? __kfree_skb+0xf/0x6e [27047.102527] [<c11f4c35>] ? net_tx_action+0x58/0xf9 [27047.102542] [<c103c1ec>] ? __do_softirq+0xaa/0x151 [27047.102557] [<c103c2c4>] ? do_softirq+0x31/0x3c [27047.102570] [<c103c39a>] ? irq_exit+0x26/0x58 [27047.102586] [<c1198a46>] ? xen_evtchn_do_upcall+0x22/0x2c [27047.102604] [<c1009b7f>] ? xen_do_upcall+0x7/0xc [27047.102630] [<c10023a7>] ? hypercall_page+0x3a7/0x1001 [27047.102647] [<c1006169>] ? xen_safe_halt+0xf/0x1b [27047.102661] [<c10042bf>] ? xen_idle+0x23/0x30 [27047.102676] [<c1008168>] ? cpu_idle+0x89/0xa5 [27047.102691] [<c13fb80d>] ? start_kernel+0x318/0x31d [27047.102706] [<c13fd3c3>] ? xen_start_kernel+0x615/0x61c [27047.102721] [<c1409045>] ? print_local_APIC+0x61/0x380 [27047.102732] Code: 8b 44 02 30 e8 9a 4f ea ff 8b 96 a4 00 00 00 0f b7 42 04 39 c3 7c e5 8b 96 a4 00 00 00 8b 42 1c 85 c0 74 16 c7 42 1c 00 00 00 00 <8b> 18 e8 d2 fc ff ff 85 db 74 04 89 d8 eb f1 8b 86 a8 00 00 00 [27047.102981] EIP: [<c11f01aa>] skb_release_data+0x71/0x90 SS:ESP 0069:c1383eec [27047.103003] CR2: 0000000000d90200 [27047.103018] ---[ end trace a577dfc0e629cd07 ]--- [27047.103028] Kernel panic - not syncing: Fatal exception in interrupt [27047.103042] Pid: 0, comm: swapper Tainted: G D 2.6.32-5-xen-686 #1 [27047.103053] Call Trace: [27047.103065] [<c128ae0d>] ? panic+0x38/0xe4 [27047.103078] [<c128d419>] ? oops_end+0x91/0x9d [27047.103092] [<c1021b5a>] ? no_context+0x134/0x13d [27047.103106] [<c1021c78>] ? __bad_area_nosemaphore+0x115/0x11d [27047.103121] [<c10067f0>] ? check_events+0x8/0xc [27047.103135] [<c10067e7>] ? xen_restore_fl_direct_end+0x0/0x1 [27047.103155] [<d0823fdb>] ? xennet_poll+0xaeb/0xb04 [xen_netfront] [27047.103170] [<c10211df>] ? pvclock_clocksource_read+0xf9/0x10f [27047.103185] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10 [27047.103200] [<c114a00f>] ? xen_swiotlb_unmap_page+0x0/0x7 [27047.103214] [<c10067f0>] ? check_events+0x8/0xc [27047.103227] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10 [27047.103242] [<c128e3f4>] ? do_page_fault+0x115/0x307 [27047.103255] [<c128e2df>] ? do_page_fault+0x0/0x307 [27047.103268] [<c1021c8a>] ? bad_area_nosemaphore+0xa/0xc [27047.103282] [<c128cb0b>] ? error_code+0x73/0x78 [27047.103295] [<c11f01aa>] ? skb_release_data+0x71/0x90 [27047.103308] [<c11efdfb>] ? __kfree_skb+0xf/0x6e [27047.103321] [<c11f4c35>] ? net_tx_action+0x58/0xf9 [27047.103335] [<c103c1ec>] ? __do_softirq+0xaa/0x151 [27047.103348] [<c103c2c4>] ? do_softirq+0x31/0x3c [27047.103361] [<c103c39a>] ? irq_exit+0x26/0x58 [27047.103374] [<c1198a46>] ? xen_evtchn_do_upcall+0x22/0x2c [27047.103388] [<c1009b7f>] ? xen_do_upcall+0x7/0xc [27047.103401] [<c10023a7>] ? hypercall_page+0x3a7/0x1001 [27047.103415] [<c1006169>] ? xen_safe_halt+0xf/0x1b [27047.103428] [<c10042bf>] ? xen_idle+0x23/0x30 [27047.103440] [<c1008168>] ? cpu_idle+0x89/0xa5 [27047.103454] [<c13fb80d>] ? start_kernel+0x318/0x31d [27047.103467] [<c13fd3c3>] ? xen_start_kernel+0x615/0x61c [27047.103481] [<c1409045>] ? print_local_APIC+0x61/0x380 ------------------------------------------------------------------------------------ Then, since the IRQ of the card is shared with the SATA controller, this basically kills the whole host, requiring a HW reset. (Sometimes this second problem also occurs when I am rebooting the domU normally; see http://lists.xensource.com/archives/html/xen-devel/2009-07/msg00224.html for the thread about the shared IRQ problem. ) This happens once in a few days, sometimes in a few hours, basically making the whole system unusable. * * * Does anybody have any idea what could be happening here? How can I fix this? Thank you for your help: Kristof Csillag _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users