So the o2net disconnect can be explained with the cpu soft lockup.
But the cpu soft lockup is a bit funky. The stack shows spin_unlock.
Typically one would expect it on a spin_lock and the hunt would be
for the process holding that spinlock. But then this is kvm. If pvops
is enabled, then it could be kvm related. Maybe. I am guessing here.
See the ubuntu bug db. Maybe they have another report of a similar
issue. That may tell us more.
On 12/14/2010 11:17 PM, Andreas Rittershofer wrote:> Am 15.12.2010 um 08:04 schrieb Sunil Mushran:
>
>> On 12/14/2010 10:59 PM, Andreas Rittershofer wrote:
>>> My log says suddenly:
>>>
>>> Dec 14 02:35:16 hp1 kernel: [1492482.232822] o2net: no longer
connected to node hp2 (num 1) at 192.168.1.2:7777
>>> Dec 14 02:35:18 hp1 kernel: [1492483.960150] BUG: soft lockup -
CPU#1 stuck for 61s! [kvm:32398]
>>>
>>> I have no idea what happens here and why - but the result are a lot
of problems with virtual machines.
>>>
>>>
>>> Viele Gr??e
>>>
>>> Andreas Rittershofer
>>>
>> There should be a stack in /var/log/messages is connection with
>> the soft lockup. Also, versions are good to know.
>
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] Pid: 32398, comm: kvm Not
tainted 2.6.32-26-server #47-Ubuntu ProLiant DL580 G5
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RIP:
0010:[<ffffffff8155a719>] [<ffffffff8155a719>]
_spin_unlock_irqrestore+0x19/0x30
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RSP: 0018:ffff8807cb61ba10
EFLAGS: 00000282
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RAX: 0000000000000282 RBX:
ffff8807cb61ba18 RCX: ffff880ce47e09f0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RDX: 0000000000ae3c4c RSI:
0000000000000282 RDI: 0000000000000282
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] RBP: ffffffff81012cae R08:
ffff880ce47e09e0 R09: 11ef23612a7a8443
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] R10: 0000000000000001 R11:
0000000000000000 R12: 0000000000000286
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] R13: 0000000000000004 R14:
000000001200c2fc R15: 0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] FS: 00007f317085a710(0000)
GS:ffff880028220000(0000) knlGS:0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] CS: 0010 DS: 002b ES: 002b
CR0: 000000008005003b
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] CR2: 00007f014de7b298 CR3:
0000000cf4379000 CR4: 00000000000026e0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] Call Trace:
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa045c8b0>] ?
ocfs2_should_refresh_lock_res+0x130/0x200 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa045ca4a>] ?
ocfs2_inode_lock_update+0xca/0x4d0 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa0460df8>] ?
ocfs2_inode_lock_full_nested+0x2e8/0x660 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa0461449>] ?
ocfs2_inode_lock_with_page+0x39/0x90 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa0457f0e>] ?
__ocfs2_cluster_unlock+0x12e/0x2f0 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa0461449>] ?
ocfs2_inode_lock_with_page+0x39/0x90 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa0446cfd>] ?
ocfs2_readpage+0x5d/0x310 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff810f46b0>] ?
T.811+0x100/0x400
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff810f4a66>] ?
generic_file_aio_read+0xb6/0x1d0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffffa0466930>] ?
ocfs2_file_aio_read+0x100/0x420 [ocfs2]
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff81096772>] ?
futex_wait+0x222/0x350
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff81143afa>] ?
do_sync_read+0xfa/0x140
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff81084250>] ?
autoremove_wake_function+0x0/0x40
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff8155a6ce>] ?
_spin_lock+0xe/0x20
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff81095862>] ?
futex_wake+0x112/0x130
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff81252246>] ?
security_file_permission+0x16/0x20
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff811443e5>] ?
vfs_read+0xb5/0x1a0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff811446f2>] ?
sys_pread64+0x82/0xa0
> Dec 14 02:35:18 hp1 kernel: [1492483.960162] [<ffffffff810121b2>] ?
system_call_fastpath+0x16/0x1b
> Dec 14 02:35:18 hp1 kernel: [1492483.960787] Modules linked in: ocfs2
quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
ocfs2_stackglue configfs xt_multiport ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge stp kvm_intel kvm fbcon tileblit font
bitblit softcursor vga16fb vgastate radeon ttm drm_kms_helper bnx2 drm psmouse
lp ipmi_si ses parport i2c_algo_bit serio_raw usbhid shpchp ipmi_msghandler hid
hpilo enclosure qla2xxx scsi_transport_fc ohci1394 ieee1394 scsi_tgt e1000e
cciss
>
>
> Yesterday morning and this morning I had the same problems; I just made an
apt-get update / upgrade hoping to avoid this problem tomorrow morning.
>
>
> Viele Gr??e
>
> Andreas Rittershofer
>