Hi, I have one KVM instance (centos 5) that keeps crashing and i see the message log with the following: Oct 14 16:24:48 localhost kernel: psmouse.c: Explorer Mouse at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away. Oct 14 16:24:49 localhost kernel: BUG: soft lockup - CPU#0 stuck for 12s! [ntpd:2363] Oct 14 16:24:49 localhost kernel: CPU 0: Oct 14 16:24:49 localhost kernel: Modules linked in: backupdriver(PU) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc talpa_pedevice(U) dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy virtio_balloon virtio_pci ide_cd i2c_piix4 virtio_ring 8139too cdrom 8139cp pcspkr i2c_core virtio mii serio_raw dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Oct 14 16:24:49 localhost kernel: Pid: 2363, comm: ntpd Tainted: P 2.6.18-194.3.1.el5 #1 Oct 14 16:24:49 localhost kernel: RIP: 0010:[<ffffffff80064b50>] [<ffffffff80064b50>] _spin_unlock_irqrestore+0x8/0x9 Oct 14 16:24:49 localhost kernel: RSP: 0018:ffffffff80446ee0 EFLAGS: 00000296 Oct 14 16:24:49 localhost kernel: RAX: 00000000000002fd RBX: ffff81007cb46b40 RCX: ffff81006975b978 Oct 14 16:24:49 localhost kernel: RDX: 0000000000000060 RSI: 0000000000000296 RDI: ffffffff80348e58 Oct 14 16:24:49 localhost kernel: RBP: ffffffff80446e60 R08: ffff81007cb46a70 R09: 0000000000000020 Oct 14 16:24:49 localhost kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8005dc8e Oct 14 16:24:49 localhost kernel: R13: 000000000000003d R14: ffffffff8007820e R15: ffffffff80446e60 Oct 14 16:24:49 localhost kernel: FS: 00002b3519a1c030(0000) GS:ffffffff803ca000(0000) knlGS:0000000000000000 Oct 14 16:25:06 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 14 16:25:06 localhost kernel: CR2: 00002b4f3abac3d8 CR3: 0000000069726000 CR4: 00000000000006e0 Oct 14 16:25:06 localhost kernel: Oct 14 16:25:06 localhost kernel: Call Trace: Oct 14 16:25:06 localhost kernel: <IRQ> [<ffffffff80209e43>] i8042_interrupt+0x92/0x1e9 Oct 14 16:25:06 localhost kernel: [<ffffffff80010bd1>] handle_IRQ_event+0x51/0xa6 Oct 14 16:25:07 localhost kernel: [<ffffffff800baec9>] __do_IRQ+0xa4/0x103 Oct 14 16:25:07 localhost kernel: [<ffffffff8006ca11>] do_IRQ+0xe7/0xf5 Oct 14 16:25:07 localhost kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa Oct 14 16:25:07 localhost kernel: <EOI> [<ffffffff8002f73b>] dev_queue_xmit+0x0/0x271 Oct 14 16:25:07 localhost kernel: [<ffffffff881987c6>] :8139cp:cp_start_xmit+0x4ef/0x511 Oct 14 16:25:07 localhost kernel: [<ffffffff8819842d>] :8139cp:cp_start_xmit+0x156/0x511 Oct 14 16:25:07 localhost kernel: [<ffffffff8022eede>] dev_hard_start_xmit+0x1b7/0x28a Oct 14 16:25:08 localhost kernel: [<ffffffff8023f0b8>] __qdisc_run+0x136/0x1f9 Oct 14 16:25:08 localhost kernel: [<ffffffff8002f88b>] dev_queue_xmit+0x150/0x271 Oct 14 16:25:08 localhost kernel: [<ffffffff80031f87>] ip_output+0x2ae/0x2dd Oct 14 16:25:08 localhost kernel: [<ffffffff8024d651>] ip_push_pending_frames+0x37d/0x465 Oct 14 16:25:08 localhost kernel: [<ffffffff8025daad>] udp_push_pending_frames+0x21e/0x243 Oct 14 16:25:08 localhost kernel: [<ffffffff8005297d>] udp_sendmsg+0x4d8/0x5ef Oct 14 16:25:08 localhost kernel: [<ffffffff80055336>] sock_sendmsg+0xf8/0x14a Oct 14 16:25:09 localhost kernel: [<ffffffff800a0abe>] autoremove_wake_function+0x0/0x2e Oct 14 16:25:09 localhost kernel: [<ffffffff80098f3b>] __dequeue_signal+0x12d/0x193 Oct 14 16:25:09 localhost kernel: [<ffffffff8009899d>] recalc_sigpending+0xe/0x25 Oct 14 16:25:09 localhost kernel: [<ffffffff8009a0db>] dequeue_signal+0x47/0xcd Oct 14 16:25:09 localhost kernel: [<ffffffff80070b89>] init_fpu+0x62/0x7f Oct 14 16:25:09 localhost kernel: [<ffffffff8006beee>] math_state_restore+0x23/0x4c Oct 14 16:25:09 localhost kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84 Oct 14 16:25:09 localhost kernel: [<ffffffff802264ac>] sys_sendto+0x11c/0x14f Oct 14 16:25:10 localhost kernel: [<ffffffff8006b011>] __switch_to+0xfe/0x22f Oct 14 16:25:10 localhost kernel: [<ffffffff80062ff8>] thread_return+0x62/0xfe Oct 14 16:25:10 localhost kernel: [<ffffffff80043b84>] sys_rt_sigreturn+0x323/0x356 Oct 14 16:25:10 localhost kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Oct 14 16:25:10 localhost kernel: Afterwhich the instance become very sluggish and unresponsive. Please advise what could be the issue. Thanks YongSan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20101014/57e143c3/attachment-0005.html>
On Oct 14, 2010, at 1:38 AM, Poh Yong Hwang wrote:> Hi, > > I have one KVM instance (centos 5) that keeps crashing and i see the message log with the following: > > Oct 14 16:24:48 localhost kernel: psmouse.c: Explorer Mouse at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away. > Oct 14 16:24:49 localhost kernel: BUG: soft lockup - CPU#0 stuck for 12s! [ntpd:2363] > Oct 14 16:24:49 localhost kernel: CPU 0: > Oct 14 16:24:49 localhost kernel: Modules linked in: backupdriver(PU) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc talpa_pedevice(U) dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy virtio_balloon virtio_pci ide_cd i2c_piix4 virtio_ring 8139too cdrom 8139cp pcspkr i2c_core virtio mii serio_raw dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd > Oct 14 16:24:49 localhost kernel: Pid: 2363, comm: ntpd Tainted: P 2.6.18-194.3.1.el5 #1[...]> Afterwhich the instance become very sluggish and unresponsive. Please advise what could be the issue.I'm no expert on kernel stuff, but I thought I'd throw in a couple suggested points of clarification on your request since the above is not clear to me. Is the above in /var/log/message on the guest or host? Is it always an "ntpd" process on the CPU#0 stuck/soft lockup line? Does the soft lockup always occur after a psmouse.c warning? (Even so, the psmouse.c warning could maybe be a symptom of the CPU being stuck, not the cause...) What type of hardware is this? Noticing that is says "tainted" and I'm assuming this is the kernel (as I have no idea how a userland process, ntpd, could be "tainted"!), then you have a binary-distributed kernel module and you should probably try with that unloaded to see if the issue goes away. It could be a machine check error, but that's less likely I think. To double check, run the following in both the host and guest: cat /proc/sys/kernel/tainted This ORed value can be checked against the flags given in http://www.kernel.org/doc/Documentation/sysctl/kernel.txt Eric