Ronald Moesbergen
2006-Nov-27 10:24 UTC
[Xen-users] Oops in Dom0 kernel when eth link fails
Hi, While running two xen machines with kernel 2.6.18-2 (the standard Xen kernels supplied by debian unstable) I get the following oops in the Dom0 kernel when the ethernet link changes from up to down: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c02855ba *pde = ma 00000000 pa fffff000 Oops: 0002 [#1] SMP Modules linked in: ip_vs_wrr ip_vs xt_physdev netconsole iptable_filter ip_tables x_tables bridge netloop drbd button ac battery loop shpchp pci_hotplug pcspkr serial_core serio_raw psmouse evdev tsdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod ide_cd cdrom generic usbhid cciss piix scsi_mod uhci_hcd ide_core bnx2 usbcore thermal processor fan CPU: 0 EIP: 0061:[<c02855ba>] Not tainted VLI EFLAGS: 00010286 (2.6.18-2-xen-686 #1) EIP is at iret_exc+0x883/0xbe6 eax: 00000000 ebx: 00000000 ecx: 00000007 edx: c0ca0000 esi: c0ca0018 edi: c06d1890 ebp: 0000004c esp: c0315d0c ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, ti=c0314000 task=c02c9660 task.ti=c0314000) Stack: 0000004c 000001d8 c0ca0000 c0227f6d c0ca0000 c06d1878 000001d8 00000000 00000000 00000000 00000018 c06d1878 c71038ac 00000001 0000004c 000005dc c52fd53c 0000025f c02079fc 000001d8 c0315e38 00000224 c76fee80 0000022c Call Trace: [<c0227f6d>] skb_copy_and_csum_bits+0x129/0x2a9 [<c02079fc>] __alloc_skb+0x6c/0x70 [<c02647a9>] icmp_glue_bits+0x1f/0x74 [<c02496f8>] ip_append_data+0x5d1/0x942 [<c026478a>] icmp_glue_bits+0x0/0x74 [<c026467d>] icmp_push_reply+0x3d/0x14a [<c0243d86>] ip_route_output_flow+0x13/0x57 [<c0264f6d>] icmp_send+0x2e7/0x350 [<c012b60c>] run_posix_cpu_timers+0x1c/0x6bf [<c011495e>] rebalance_tick+0x116/0x2ae [<c0241b36>] ipv4_link_failure+0x14/0x3c [<c0262f1c>] arp_error_report+0x1c/0x24 [<c0232c0d>] neigh_timer_handler+0x18e/0x24d [<c0232a7f>] neigh_timer_handler+0x0/0x24d [<c0121c28>] run_timer_softirq+0x101/0x15c [<c011de82>] __do_softirq+0x5e/0xc3 [<c011df21>] do_softirq+0x3a/0x4a [<c01060c9>] do_IRQ+0x48/0x53 [<c0206518>] evtchn_do_upcall+0x64/0x9b [<c01049d9>] hypervisor_callback+0x3d/0x48 [<c01072c6>] raw_safe_halt+0x8c/0xaf [<c0102c63>] xen_idle+0x22/0x2e [<c0102d82>] cpu_idle+0x91/0xab [<c03196fe>] start_kernel+0x37a/0x381 Code: ff ff ff e9 a8 4f ef ff b8 f2 ff ff ff e9 c7 4f ef ff b8 f2 ff ff ff e9 e7 4f ef ff 8b 3d 20 0b 36 c0 e9 ef 93 ef ff 8b 5c 24 20 <c7> 03 f2 ff ff ff 8b 7c 24 14 8b 4c 24 18 31 c0 f3 aa e9 4b 0d EIP: [<c02855ba>] iret_exc+0x883/0xbe6 SS:ESP 0069:c0315d0c <0>Kernel panic - not syncing: Fatal exception in interrupt Some details about the setup: The machines are linked by an ethernet cross-cable via eth1. eth0 on both machines links to the LAN where clients connect to a virtual IP address managed by heartbeat. Both machines run 1 DomU providing services. Data replication is done with drbd over the eth1 link. This is what happens: - Both machines are running fine, one DomU per physical machine, load balanced. - One of the machines has a (simulated) problem, (poweroff -f). - The second machine takes over all DomU''s. Then seconds later the above oops occurs and the second machine is also down. Not quite as intended :) My guess is that this has to do with the eth1 ethernet link failing because of the cross-cable, but I could be wrong. The network driver used is bnx2, the network card is a ''Broadcom NetXtreme II BCM5708 1000Base-T (B1) PCI-X 64-bit 133MHz''. I have tried to reproduce it on a non-xen kernel, but couldn''t. Also someone suggested I disable tx checksumming in both DomU''s, but that made no difference. Below is some output of xm info and xm dmesg. Xm info: host : kalium release : 2.6.18-2-xen-686 version : #1 SMP Thu Nov 9 00:21:32 UTC 2006 machine : i686 nr_cpus : 4 nr_nodes : 1 sockets_per_node : 1 cores_per_socket : 2 threads_per_core : 2 cpu_mhz : 3200 hw_caps : bfebfbff:20100000:00000000:00000180:0000e43d:00000000:00000001 total_memory : 2047 free_memory : 1379 xen_major : 3 xen_minor : 0 xen_extra : .3-1 xen_caps : xen-3.0-x86_32 hvm-3.0-x86_32 xen_pagesize : 4096 platform_params : virt_start=0xfc000000 xen_changeset : Tue Oct 17 22:09:52 2006 +0100 cc_compiler : gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19) cc_compile_by : ultrotter cc_compile_domain : debian.org cc_compile_date : Thu Nov 2 20:28:13 CET 2006 xend_config_format : 2 Xm dmesg: Xen version 3.0.3-1 (Debian 3.0.3-0-2) (ultrotter@debian.org) (gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)) Thu Nov 2 20:28:13 CET 2006 Latest ChangeSet: Tue Oct 17 22:09:52 2006 +0100 (XEN) Command line: /boot/xen-3.0.3-1-i386.gz dom0_mem=128Mb (XEN) Physical RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 000000007ffc8000 (usable) (XEN) 000000007ffc8000 - 000000007ffd0000 (ACPI data) (XEN) 000000007ffd0000 - 0000000080000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) System RAM: 2047MB (2096540kB) (XEN) Xen heap: 10MB (10408kB) (XEN) PAE disabled. (XEN) found SMP MP-table at 000f4f80 (XEN) DMI 2.3 present. (XEN) Using APIC driver default (XEN) ACPI: RSDP (v002 HP ) @ 0x000f4f00 (XEN) ACPI: XSDT (v001 HP P58 0x00000002 Ò 0x0000162e) @ 0x7ffc8300 (XEN) ACPI: FADT (v003 HP P58 0x00000002 Ò 0x0000162e) @ 0x7ffc8380 (XEN) ACPI: SPCR (v001 HP SPCRRBSU 0x00000001 Ò 0x0000162e) @ 0x7ffc8100 (XEN) ACPI: MCFG (v001 HP ProLiant 0x00000001 0x00000000) @ 0x7ffc8180 (XEN) ACPI: HPET (v001 HP P58 0x00000002 Ò 0x0000162e) @ 0x7ffc81c0 (XEN) ACPI: SPMI (v005 HP ProLiant 0x00000001 Ò 0x0000162e) @ 0x7ffc8200 (XEN) ACPI: MADT (v001 HP 00000083 0x00000002 0x00000000) @ 0x7ffc8240 (XEN) ACPI: DSDT (v001 HP DSDT 0x00000001 INTL 0x20030228) @ 0x00000000 (XEN) ACPI: Local APIC address 0xfee00000 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 15:6 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) (XEN) Processor #2 15:6 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled) (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) (XEN) Processor #1 15:6 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) (XEN) Processor #3 15:6 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled) (XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) (XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 (XEN) ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24]) (XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) ACPI: IRQ9 used by override. (XEN) Enabling APIC mode: Flat. Using 2 I/O APICs (XEN) ACPI: HPET id: 0x10228201 base: 0xfed00000 (XEN) Using ACPI (MADT) for SMP configuration information (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 3200.281 MHz processor. (XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K (XEN) CPU: L2 cache: 2048K (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 0 (XEN) VMXON is done (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#0. (XEN) CPU0: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU0: Thermal monitoring enabled (XEN) CPU0: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04 (XEN) Booting processor 1/2 eip 90000 (XEN) Initializing CPU#1 (XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K (XEN) CPU: L2 cache: 2048K (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 1 (XEN) VMXON is done (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#1. (XEN) CPU1: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU1: Thermal monitoring enabled (XEN) CPU1: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04 (XEN) Booting processor 2/1 eip 90000 (XEN) Initializing CPU#2 (XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K (XEN) CPU: L2 cache: 2048K (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 0 (XEN) VMXON is done (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#2. (XEN) CPU2: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU2: Thermal monitoring enabled (XEN) CPU2: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04 (XEN) Booting processor 3/3 eip 90000 (XEN) Initializing CPU#3 (XEN) CPU: Trace cache: 12K uops, L1 D cache: 16K (XEN) CPU: L2 cache: 2048K (XEN) CPU: Physical Processor ID: 0 (XEN) CPU: Processor Core ID: 1 (XEN) VMXON is done (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#3. (XEN) CPU3: Intel P4/Xeon Extended MCE MSRs (24) available (XEN) CPU3: Thermal monitoring enabled (XEN) CPU3: Intel(R) Xeon(TM) CPU 3.20GHz stepping 04 (XEN) Total of 4 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) checking TSC synchronization across 4 CPUs: passed. (XEN) Platform timer is 14.318MHz HPET (XEN) Brought up 4 CPUs (XEN) Machine check exception polling timer started. (XEN) *** LOADING DOMAIN 0 *** (XEN) Domain 0 kernel supports features = { 0000001f }. (XEN) Domain 0 kernel requires features = { 00000000 }. (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 03000000->04000000 (28672 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: c0100000->c038b874 (XEN) Init. ramdisk: c038c000->c0eab200 (XEN) Phys-Mach map: c0eac000->c0ecc000 (XEN) Start info: c0ecc000->c0ecc46c (XEN) Page tables: c0ecd000->c0ed2000 (XEN) Boot stack: c0ed2000->c0ed3000 (XEN) TOTAL: c0000000->c1000000 (XEN) ENTRY ADDRESS: c0100000 (XEN) Dom0 has maximum 4 VCPUs (XEN) Initrd len 0xb1f200, start at 0xc038c000 (XEN) Scrubbing Free RAM: .....................done. (XEN) Xen trace buffers: disabled (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen). Any clues to what''s wrong here? If more info is needed, please ask. Thanks in advance. Regards, Ronald. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users