Ray Barnes
2008-Apr-05 10:29 UTC
[Xen-devel] BUG(?): multipathd confusion leads to kernel panic in Xen 3.2.1-rc2
Hi all. While playing with iSCSI Enterprise Target + multipathd on CentOS
5.1 (both the target and the initiator/multipath/xen box are Cent 5.1), I
encountered a strange fault condition that leads to a kernel panic in a
version of Xen 3.2.1-rc2 pulled from a couple of days ago. My lab consists
of two Clovertown machines with dual GigE into separate switches. The
target box is softraid5 (although I was able to reproduce this using a
single drive on the target), and runs a default config of IET, i.e.
''yum -y
install scsi-target-utils ; /etc/init.d/tgtd start''. The config on the
initiator needed to reproduce this is default to the best of my
recollection. Xen was compiled with ''make world XEN_TARGET_X86_PAE=y
vmxassist=n''. /etc/multipath.conf is as follows:
defaults {
udev_dir /dev
polling_interval 2
selector "round-robin 0"
path_grouping_policy multibus
# getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout /bin/true
path_checker readsector0
rr_min_io 10
rr_weight priorities
failback 2
no_path_retry fail
user_friendly_name no
}
The default parameter node.conn[0].timeo.noop_out_interval = 10 on the
initiator tells it to "ping" the target once every 10 seconds, then
per
node.conn[0].timeo.noop_out_timeout = 15, wait 15 seconds before marking the
target down. So most of the time it can figure out a path is down in about
20 seconds, but if you catch it just wrong it''ll take 25 seconds. Add
to
that the 2 second polling interval in multipathd. What seems to happen is
that when I yank an Ethernet cable, multipathd gets confused and takes 30+
seconds to figure things out (this could be a bug in multipathd). But when
that happens, a kernel panic ensues (see below). I have been able to
reproduce this in the version of Xen that comes with Cent 5.1, as well as
3.2.0 and 3.2.1-rc2 pulled from hg a couple of days ago with a fresh pull of
2.6.18.8. I can very easily reproduce this every time while installing Cent
5.1 into a domU, it''s probably happened 10 times thus far. I can also
reproduce easily with a ''dd'' inside of a domU that gets its
filesystem from
the initiator/multipathd, simply by yanking and replugging one of the
Ethernet cables a few times. I also reproduced once just running
''dd''
directly against the multipathed target device in /dev/mapper from within
the dom0. However I tried very hard to reproduce this inside the latest
non-Xen kernel of CentOS 5.1 and I could not. It''s appears to be a Xen
issue, which under no circumstance should crash the entire box. In a final
effort to add more substance and background to this, I attempted to yank
both cables while running ''dd'' in the dom0 to the target.
Although it threw
a bunch of errors, I did not make it panic. After multipathd marked both
paths down, the ''dd'' process failed with an io error which is
expected
behavior. Same thing inside the domU.
Hopefully this helps *someone*. Rather than filing a bug report first, I
wanted to describe this here so you guys could maybe blame it on multipath
or tell me to go jump in lake minnetonka. If I can provide any more
background on this, please let me know, as I should have this lab setup for
several more days.
Sincerely,
Ray Barnes
p.s. I''m *extremely* pleased with the quality and quantity of good work
going on with Xen in the public domain nowadays; keep up the good work!
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000000
printing eip:
c0302709
27935000 -> *pde = 00000001:17898001
27298000 -> *pme = 00000000:00000000
Oops: 0002 [#1]
SMP
Modules linked in: xt_physdev iptable_filter ip_tables bridge autofs4 hidp
rfcomm l2cap bluetooth sunrpc ip6t_REJECT xt_tcpudp ip6table_filter
ip6_tables x_tables ipv6 ib_iser rdma_cm ib_addr ib_cm ib_sa ib_mad ib_core
iscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc dm_mirror dm_round_robin
dm_multipath dm_mod video thermal sbs processor i2c_ec fan container button
battery asus_acpi ac lp nvram sg evdev e1000 parport_pc parport i2c_i801
i2c_core pcspkr piix serio_raw sisfb shpchp pci_hotplug 8250_pnp 8250
serial_core rtc ide_disk ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd
ohci_hcd uhci_hcd usbcore
CPU: 1
EIP: 0061:[<c0302709>] Not tainted VLI
EFLAGS: 00010286 (2.6.18.8-xen #1)
EIP is at iret_exc+0xc6a/0x105e
eax: 00000000 ebx: 00000000 ecx: 00000007 edx: ed470b40
esi: ed470b50 edi: e68c6490 ebp: 000001f0 esp: ed7a3c84
ds: 007b es: 007b ss: 0069
Process swapper (pid: 0, ti=ed7a2000 task=ed79f080 task.ti=ed7a2000)
Stack: 00000034 000001f0 ed470000 c0296f70 ed470b20 e68c6460 000001f0
00000000
00000000 00000000 e68c6460 00000034 e7de98ac 00000000 00000034
00000514
e7f8d594 c0294149 e68c642c ed7a3dbc e7f73440 00000000 c02df3d4
00000224
Call Trace:
[<c0296f70>] skb_copy_and_csum_bits+0x140/0x320
[<c0294149>] sock_alloc_send_skb+0x169/0x1c0
[<c02df3d4>] icmp_glue_bits+0x34/0xa0
[<c02be7b3>] ip_append_data+0x623/0xa60
[<c02df3a0>] icmp_glue_bits+0x0/0xa0
[<c02df286>] icmp_push_reply+0x56/0x170
[<c02b7ea1>] ip_route_output_flow+0x21/0x90
[<c02dfc7d>] icmp_send+0x2cd/0x3f0
[<c013d260>] hrtimer_wakeup+0x0/0x20
[<c02b5eec>] ipv4_link_failure+0x1c/0x50
[<c02dd49c>] arp_error_report+0x1c/0x30
[<c02a4158>] neigh_timer_handler+0xf8/0x2c0
[<c012fb0b>] run_timer_softirq+0x13b/0x1f0
[<c02a4060>] neigh_timer_handler+0x0/0x2c0
[<c012a562>] __do_softirq+0x92/0x130
[<c012a679>] do_softirq+0x79/0x80
[<c0107714>] do_IRQ+0x44/0xa0
[<c0248540>] evtchn_do_upcall+0xe0/0x1f0
[<c0105bbd>] hypervisor_callback+0x3d/0x45
[<c0108c7a>] raw_safe_halt+0x9a/0x120
[<c0104709>] xen_idle+0x29/0x50
[<c01036dd>] cpu_idle+0x6d/0xc0
Code: ff e9 f7 6f ea ff 8b 1d 80 32 41 c0 e9 ea c5 ea ff 8b 1d 80 32 41 c0
e9 ff c5 ea ff 8b 15 80 32 41 c0 e9 14 c6 ea ff 8b 5c 24 20 <c7> 03 f2 ff
ff
ff 8b 7c 24 14 8b 4c 24 18 31 c0 f3 aa e9 60 a7
EIP: [<c0302709>] iret_exc+0xc6a/0x105e SS:ESP 0069:ed7a3c84
<0>Kernel panic - not syncing: Fatal exception in interrupt
(XEN) Domain 0 crashed: ''noreboot'' set - not rebooting.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
