dongkyu lee
2010-May-31 10:37 UTC
[Xen-users] Kernel panic is occured when multi VMs is booting togeter
This error is not 100% reproducible. However, kernel panic has been occurred many times for last two months. It usually happens when multi VMs (in our case, 14 VMs) are booting together. After we made three big changes to gain more availibility, it began to happen. Changes are : 1. Use *NAS* as vm storage for migration from local disk 2. Use *two bondings* for switch HA (with 4 nics) from no bondings 3. Use *Out of Band (OOB)* in case of being unable to connect by ssh/rsh We thought it should be safer system than before but it is apparently not. [?] Any advise would be appreciated. <System Infomation> 1. Xen Version : 3.4.1 2. dom0 memory is set to "dom0_mem=2G" 3. Physical Server Model: HP DL360G6 (Nehalem Server, 48GB RAM) <Panic Message> blkback: ring-ref 8, event-channel 9, protocol 1 (x86_64-abi) blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) Unable to handle kernel paging request at ffff880074ec2b68 RIP: [<ffffffff804158eb>] skb_copy_bits+0x114/0x1d3 PGD 11a4067 PUD 13a6067 PMD 154e067 PTE 0 Oops: 0000 [1] SMP last sysfs file: /devices/xen-backend/vbd-1-51712/statistics/wr_sect CPU 2 Modules linked in: bridge netloop netbk blktap blkbk sg bonding ipv6 xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport e1000e serial_core bnx2 hpilo serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-164.6.1.el5xen #1 RIP: e030:[<ffffffff804158eb>] [<ffffffff804158eb>] skb_copy_bits+0x114/0x1d3 RSP: e02b:ffff8800010dfe00 EFLAGS: 00010246 RAX: 0000000000000036 RBX: ffff8800787b16c0 RCX: 0000000000000498 RDX: ffff880079612d00 RSI: ffff880074ec2b68 RDI: ffff88006916b000 RBP: 0000000000000000 R08: ffff880079612d10 R09: 00000000000004ce R10: 0000000000000498 R11: 0000000000000000 R12: 0000000000000036 R13: 0000000000000036 R14: 0000000000000000 R15: ffff88006916b000 FS: 00002b2df42fcdc0(0000) GS:ffffffff805ca100(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b Process swapper (pid: 0, threadinfo ffff880000d98000, task ffff880000da17e0) Stack: 0000000200000002 ffff88007c4f2dc0 0000000000000498 ffff8800661dbe80 ffff88007cf37d00 ffff8800787b16c0 ffff880002cb9f68 ffffffff88478732 ffff88007cf37800 0000000000000036 Call Trace: <IRQ> [<ffffffff88478732>] :netbk:netif_be_start_xmit+0x241/0x471 [<ffffffff8042ac95>] __qdisc_run+0x136/0x1f9 [<ffffffff803ae6cc>] unmask_evtchn+0x2d/0xd7 [<ffffffff8041be2d>] net_tx_action+0xc9/0xf1 [<ffffffff80212c99>] __do_softirq+0x8d/0x13b [<ffffffff80260da4>] call_softirq+0x1c/0x278 [<ffffffff8026e0ab>] do_softirq+0x31/0x98 [<ffffffff8026df37>] do_IRQ+0xec/0xf5 [<ffffffff803af054>] evtchn_do_upcall+0x13b/0x1fb [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802999eb>] rcu_pending+0x26/0x50 [<ffffffff8026f4d5>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca50>] xen_idle+0x38/0x4a [<ffffffff8024afa1>] cpu_idle+0x97/0xba _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com lists.xensource.com/xen-users