thr3ads.net - CentOS virt - [CentOS-virt] KVM instance keep crashing [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Poh Yong Hwang

2010-Oct-14 08:38 UTC

[CentOS-virt] KVM instance keep crashing

Hi,

I have one KVM instance (centos 5) that keeps crashing and i see the message
log with the following:

Oct 14 16:24:48 localhost kernel: psmouse.c: Explorer Mouse at
isa0060/serio1/input0 lost synchronization, throwing 1 bytes away.
Oct 14 16:24:49 localhost kernel: BUG: soft lockup - CPU#0 stuck for 12s!
[ntpd:2363]
Oct 14 16:24:49 localhost kernel: CPU 0:
Oct 14 16:24:49 localhost kernel: Modules linked in: backupdriver(PU) ipv6
xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
talpa_pedevice(U) dm_mirror dm_multipath scsi_dh video backlight sbs
power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi
acpi_memhotplug ac parport_pc lp parport floppy virtio_balloon virtio_pci
ide_cd i2c_piix4 virtio_ring 8139too cdrom 8139cp pcspkr i2c_core virtio mii
serio_raw dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache
ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Oct 14 16:24:49 localhost kernel: Pid: 2363, comm: ntpd Tainted: P
 2.6.18-194.3.1.el5 #1
Oct 14 16:24:49 localhost kernel: RIP: 0010:[<ffffffff80064b50>]
 [<ffffffff80064b50>] _spin_unlock_irqrestore+0x8/0x9
Oct 14 16:24:49 localhost kernel: RSP: 0018:ffffffff80446ee0  EFLAGS:
00000296
Oct 14 16:24:49 localhost kernel: RAX: 00000000000002fd RBX:
ffff81007cb46b40 RCX: ffff81006975b978
Oct 14 16:24:49 localhost kernel: RDX: 0000000000000060 RSI:
0000000000000296 RDI: ffffffff80348e58
Oct 14 16:24:49 localhost kernel: RBP: ffffffff80446e60 R08:
ffff81007cb46a70 R09: 0000000000000020
Oct 14 16:24:49 localhost kernel: R10: 0000000000000000 R11:
0000000000000000 R12: ffffffff8005dc8e
Oct 14 16:24:49 localhost kernel: R13: 000000000000003d R14:
ffffffff8007820e R15: ffffffff80446e60
Oct 14 16:24:49 localhost kernel: FS:  00002b3519a1c030(0000)
GS:ffffffff803ca000(0000) knlGS:0000000000000000
Oct 14 16:25:06 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Oct 14 16:25:06 localhost kernel: CR2: 00002b4f3abac3d8 CR3:
0000000069726000 CR4: 00000000000006e0
Oct 14 16:25:06 localhost kernel:
Oct 14 16:25:06 localhost kernel: Call Trace:
Oct 14 16:25:06 localhost kernel:  <IRQ>  [<ffffffff80209e43>]
i8042_interrupt+0x92/0x1e9
Oct 14 16:25:06 localhost kernel:  [<ffffffff80010bd1>]
handle_IRQ_event+0x51/0xa6
Oct 14 16:25:07 localhost kernel:  [<ffffffff800baec9>]
__do_IRQ+0xa4/0x103
Oct 14 16:25:07 localhost kernel:  [<ffffffff8006ca11>] do_IRQ+0xe7/0xf5
Oct 14 16:25:07 localhost kernel:  [<ffffffff8005d615>]
ret_from_intr+0x0/0xa
Oct 14 16:25:07 localhost kernel:  <EOI>  [<ffffffff8002f73b>]
dev_queue_xmit+0x0/0x271
Oct 14 16:25:07 localhost kernel:  [<ffffffff881987c6>]
:8139cp:cp_start_xmit+0x4ef/0x511
Oct 14 16:25:07 localhost kernel:  [<ffffffff8819842d>]
:8139cp:cp_start_xmit+0x156/0x511
Oct 14 16:25:07 localhost kernel:  [<ffffffff8022eede>]
dev_hard_start_xmit+0x1b7/0x28a
Oct 14 16:25:08 localhost kernel:  [<ffffffff8023f0b8>]
__qdisc_run+0x136/0x1f9
Oct 14 16:25:08 localhost kernel:  [<ffffffff8002f88b>]
dev_queue_xmit+0x150/0x271
Oct 14 16:25:08 localhost kernel:  [<ffffffff80031f87>]
ip_output+0x2ae/0x2dd
Oct 14 16:25:08 localhost kernel:  [<ffffffff8024d651>]
ip_push_pending_frames+0x37d/0x465
Oct 14 16:25:08 localhost kernel:  [<ffffffff8025daad>]
udp_push_pending_frames+0x21e/0x243
Oct 14 16:25:08 localhost kernel:  [<ffffffff8005297d>]
udp_sendmsg+0x4d8/0x5ef
Oct 14 16:25:08 localhost kernel:  [<ffffffff80055336>]
sock_sendmsg+0xf8/0x14a
Oct 14 16:25:09 localhost kernel:  [<ffffffff800a0abe>]
autoremove_wake_function+0x0/0x2e
Oct 14 16:25:09 localhost kernel:  [<ffffffff80098f3b>]
__dequeue_signal+0x12d/0x193
Oct 14 16:25:09 localhost kernel:  [<ffffffff8009899d>]
recalc_sigpending+0xe/0x25
Oct 14 16:25:09 localhost kernel:  [<ffffffff8009a0db>]
dequeue_signal+0x47/0xcd
Oct 14 16:25:09 localhost kernel:  [<ffffffff80070b89>] init_fpu+0x62/0x7f
Oct 14 16:25:09 localhost kernel:  [<ffffffff8006beee>]
math_state_restore+0x23/0x4c
Oct 14 16:25:09 localhost kernel:  [<ffffffff8005dde9>]
error_exit+0x0/0x84
Oct 14 16:25:09 localhost kernel:  [<ffffffff802264ac>]
sys_sendto+0x11c/0x14f
Oct 14 16:25:10 localhost kernel:  [<ffffffff8006b011>]
__switch_to+0xfe/0x22f
Oct 14 16:25:10 localhost kernel:  [<ffffffff80062ff8>]
thread_return+0x62/0xfe
Oct 14 16:25:10 localhost kernel:  [<ffffffff80043b84>]
sys_rt_sigreturn+0x323/0x356
Oct 14 16:25:10 localhost kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Oct 14 16:25:10 localhost kernel:

Afterwhich the instance become very sluggish and unresponsive. Please advise
what could be the issue.

Thanks

YongSan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos-virt/attachments/20101014/57e143c3/attachment-0005.html>

Eric Searcy

2010-Oct-14 17:27 UTC

head link

[CentOS-virt] KVM instance keep crashing

On Oct 14, 2010, at 1:38 AM, Poh Yong Hwang wrote:
> Hi,
> 
> I have one KVM instance (centos 5) that keeps crashing and i see the
message log with the following:
> 
> Oct 14 16:24:48 localhost kernel: psmouse.c: Explorer Mouse at
isa0060/serio1/input0 lost synchronization, throwing 1 bytes away.
> Oct 14 16:24:49 localhost kernel: BUG: soft lockup - CPU#0 stuck for 12s!
[ntpd:2363]
> Oct 14 16:24:49 localhost kernel: CPU 0:
> Oct 14 16:24:49 localhost kernel: Modules linked in: backupdriver(PU) ipv6
xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
talpa_pedevice(U) dm_mirror dm_multipath scsi_dh video backlight sbs power_meter
hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc
lp parport floppy virtio_balloon virtio_pci ide_cd i2c_piix4 virtio_ring 8139too
cdrom 8139cp pcspkr i2c_core virtio mii serio_raw dm_raid45 dm_message
dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata sd_mod scsi_mod ext3
jbd uhci_hcd ohci_hcd ehci_hcd
> Oct 14 16:24:49 localhost kernel: Pid: 2363, comm: ntpd Tainted: P     
2.6.18-194.3.1.el5 #1
[...]> Afterwhich the instance become very sluggish and unresponsive. Please
advise what could be the issue.
I'm no expert on kernel stuff, but I thought I'd throw in a couple
suggested points of clarification on your request since the above is not clear
to me.

Is the above in /var/log/message on the guest or host?

Is it always an "ntpd" process on the CPU#0 stuck/soft lockup line? 
Does the soft lockup always occur after a psmouse.c warning?  (Even so, the
psmouse.c warning could maybe be a symptom of the CPU being stuck, not the
cause...)

What type of hardware is this?  Noticing that is says "tainted" and
I'm assuming this is the kernel (as I have no idea how a userland process,
ntpd, could be "tainted"!), then you have a binary-distributed kernel
module and you should probably try with that unloaded to see if the issue goes
away.  It could be a machine check error, but that's less likely I think. 
To double check, run the following in both the host and guest:

cat /proc/sys/kernel/tainted

This ORed value can be checked against the flags given in
http://www.kernel.org/doc/Documentation/sysctl/kernel.txt

Eric

Possibly Parallel Threads

Search for more maybe matching threads

CentOS virt - Oct 2010 - KVM instance keep crashing

[CentOS-virt] KVM instance keep crashing

[CentOS-virt] KVM instance keep crashing

Possibly Parallel Threads