This is the same as issue.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012
Is this happening frequently? We have failed to reproduce it in
our test cluster.
If you can reproduce it, I could give you a potential fix for testing.
Let me know.
Sunil
Christian van Barneveld wrote:> Hi,
>
> The last few weeks we had several times a kernel stacktrace and after that
the ocfs2 filesystems don't respond anymore (no output on ls) at all the
nodes.
>
> Kern.log at node-2
>
----------------------------------------------------------------------------
> Oct 3 06:57:18 XXX kernel: (7178,0):dlm_drop_lockres_ref:2291 ERROR:
while dropping ref on
6EDBC1B22BBB4E28AD9453CD5B2F60C3:M000000000000000007f06600000000 (master=0) got
-22.
> Oct 3 06:57:18 XXX kernel: (7178,0):dlm_print_one_lock_resource:50
lockres: M000000000000000007f06600000000, owner=0, state=64
> Oct 3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:82
lockres: M000000000000000007f06600000000, owner=0, state=64
> Oct 3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:84
last used: 49827182, on purge list: yes
> Oct 3 06:57:18 XXX kernel: (7178,0):dlm_print_lockres_refmap:61 refmap
nodes: [ ], inflight=0
> Oct 3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:86
granted queue:
> Oct 3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:101
converting queue:
> Oct 3 06:57:18 XXX kernel: (7178,0):__dlm_print_one_lock_resource:116
blocked queue:
> Oct 3 06:57:20 XXX kernel: ------------[ cut here ]------------
> Oct 3 06:57:20 XXX kernel: kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2293!
> Oct 3 06:57:20 XXX kernel: invalid opcode: 0000 [#1] SMP
> Oct 3 06:57:20 XXX kernel: Modules linked in: ocfs2 xt_multiport
nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter dm_round_robin dm_rdac
ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs dm_multipath dm_mod qla2xxx
> Oct 3 06:57:20 XXX kernel:
> Oct 3 06:57:20 XXX kernel: Pid: 7178, comm: dlm_thread Not tainted
(2.6.25.5-qla2xxx-mpath-fw-cluster-hm64 #1)
> Oct 3 06:57:20 XXX kernel: EIP: 0060:[<f8eebd11>] EFLAGS: 00010286
CPU: 0
> Oct 3 06:57:20 XXX kernel: EIP is at dlm_drop_lockres_ref+0x1c1/0x280
[ocfs2_dlm]
> Oct 3 06:57:20 XXX kernel: EAX: e79268a8 EBX: f7118600 ECX: c06a6ca4 EDX:
00000092
> Oct 3 06:57:20 XXX kernel: ESI: ffffffea EDI: f5b21eff EBP: 0000001f ESP:
f5b21ea4
> Oct 3 06:57:20 XXX kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Oct 3 06:57:20 XXX kernel: Process dlm_thread (pid: 7178, ti=f5b20000
task=f72ec430 task.ti=f5b20000)
> Oct 3 06:57:20 XXX kernel: Stack: f8efebec 00001c0a 00000000 f8ef9cd2
000008f3 f599b940 0000001f ede9c460
> Oct 3 06:57:20 XXX kernel: 00000000 ffffffea e7926880 f7118600
ede9c460 00000000 1f010000 3030304d
> Oct 3 06:57:20 XXX kernel: 30303030 30303030 30303030 66373030
30363630 30303030 00303030 00000000
> Oct 3 06:57:20 XXX kernel: Call Trace:
> Oct 3 06:57:20 XXX kernel: [<f8edf347>] dlm_thread+0x327/0x1420
[ocfs2_dlm]
> Oct 3 06:57:20 XXX kernel: [<c011beb9>] hrtick_set+0x69/0x140
> Oct 3 06:57:20 XXX kernel: [<c0133180>]
autoremove_wake_function+0x0/0x50
> Oct 3 06:57:20 XXX kernel: [<f8edf020>] dlm_thread+0x0/0x1420
[ocfs2_dlm]
> Oct 3 06:57:20 XXX kernel: [<c0132e92>] kthread+0x42/0x70
> Oct 3 06:57:20 XXX kernel: [<c0132e50>] kthread+0x0/0x70
> Oct 3 06:57:20 XXX kernel: [<c0103a17>]
kernel_thread_helper+0x7/0x10
> Oct 3 06:57:20 XXX kernel: ======================> Oct 3 06:57:20 XXX
kernel: Code: d2 9c ef f8 89 54 24 08 89 44 24 14 8b 81 d8 00 00 00 c7 04 24 ec
eb ef f8 89 44 24 04 e8 98 55 23 c7 8b 44 24 28 e8 3f 2c ff ff <0f> 0b eb
fe 3d 00 fe ff ff 0f 95 c2 83 f8 fc 0f 95 c0 84 d0 0f
> Oct 3 06:57:20 XXX kernel: EIP: [<f8eebd11>]
dlm_drop_lockres_ref+0x1c1/0x280 [ocfs2_dlm] SS:ESP 0068:f5b21ea4
> Oct 3 06:57:20 XXX kernel: ---[ end trace 52ed3dea72cac956 ]---
>
>
----------------------------------------------------------------------------
>
> kern.log at node-1:
>
> Oct 3 06:57:18 XXX kernel: (5799,1):dlm_deref_lockres_handler:2336 ERROR:
6EDBC1B22BBB4E28AD9453CD5B2F60C3:M000000000000000007f06600000000: bad lockres
name
>
> # uname -r:
> 2.6.25.5
>
> # debugfs.ocfs2 -V
> debugfs.ocfs2 1.4.1
>
> # dmesg
> OCFS2 Node Manager 1.5.0
> OCFS2 DLM 1.5.0
> OCFS2 DLMFS 1.5.0
>
> We have 2 nodes in the cluster and the freeze was observed on both nodes.
> Only a reboot solves the problem.
>
> Any help appreciated.
>
> Christian van Barneveld
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>