Hi! I have a 2-node ocfs2-cluster working as an active-active HA-NFS server. At a few times disc access has hung on some ocfs2 file system. When I now got back from a vacation the systems had hung again during my absence, but fortunately another sysadmin were able to bring the systems back up by rebooting and running fsck.ocfs2. This time I was able to find some entries in the log files which might explain why a file system hangs. The two nodes are named lejonapa and kattapa. The relevant parts of the logs are as follows (sorry about the wrapped long lines): Oct 20 05:28:49 lejonapa kernel: (3252,3):dlm_deref_lockres_handler:2337 ERROR: D7E57FB7475045C49538F2B4A6307E54:N00000000000101da: bad lockres name Oct 20 05:28:49 kattapa kernel: (3480,1):dlm_print_one_lock_resource:50 lockres: N00000000000101da, owner=0, state=64 Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:82 lockres: N00000000000101da, owner=0, state=64 Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:84 last used: 662639943, on purge list: yes Oct 20 05:28:49 kattapa kernel: (3480,1):dlm_print_lockres_refmap:61 refmap nodes: [ ], inflight=0 Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:86 granted queue: Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:101 converting queue: Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:116 blocked queue: Oct 20 05:28:49 kattapa kernel: (3480,1):dlm_drop_lockres_ref:2292 ERROR: while dropping ref on D7E57FB7475045C49538F2B4A6307E54:N00000000000101da (master=0) got -22. Oct 20 05:28:49 kattapa kernel: ------------[ cut here ]------------ Oct 20 05:28:49 kattapa kernel: kernel BUG at fs/ocfs2/dlm/dlmmaster.c:2294! Oct 20 05:28:49 kattapa kernel: invalid opcode: 0000 [#1] Oct 20 05:28:49 kattapa kernel: SMP Oct 20 05:28:49 kattapa kernel: Modules linked in: autofs nfsd exportfs ipv6 pcmcia pcmcia_core capability commoncap agpgart lp parport_pc parport pcspkr psmouse e1000 iTCO_wdt iTCO_vendor_support shpchp ata_g eneric serio_raw sg evdev Oct 20 05:28:49 kattapa kernel: CPU: 1 Oct 20 05:28:49 kattapa kernel: EIP: 0060:[<c036b595>] Not tainted VLI Oct 20 05:28:49 kattapa kernel: EFLAGS: 00010282 (2.6.21.5-smp #3) Oct 20 05:28:49 kattapa kernel: EIP is at dlm_drop_lockres_ref+0x1c5/0x280 Oct 20 05:28:49 kattapa kernel: eax: dd495300 ebx: f7409a00 ecx: fffffd7b edx: dd4953e8 Oct 20 05:28:49 kattapa kernel: esi: c06f60f2 edi: 000008f4 ebp: 0000001f esp: f0473e60 Oct 20 05:28:49 kattapa kernel: ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Oct 20 05:28:49 kattapa kernel: Process dlm_thread (pid: 3480, ti=f0472000 task=f5232570 task.ti=f0472000) Oct 20 05:28:49 kattapa kernel: Stack: c07c661c 00000d98 00000001 c06f60f2 000008f4 f4609780 0000001f f29ef8e0 Oct 20 05:28:49 kattapa kernel: 00000000 ffffffea dd4953c0 f7409a00 f29ef8e0 00000000 1f010000 3030304e Oct 20 05:28:49 kattapa kernel: 30303030 30303030 64313031 00000061 10000000 0000b536 00000000 00000000 Oct 20 05:28:49 kattapa kernel: Call Trace: Oct 20 05:28:49 kattapa kernel: [<c035e877>] dlm_run_purge_list+0x1f7/0x430 Oct 20 05:28:49 kattapa kernel: [<c0129c18>] try_to_del_timer_sync+0x48/0x50 Oct 20 05:28:49 kattapa kernel: [<c0129c2e>] del_timer_sync+0xe/0x20 Oct 20 05:28:49 kattapa kernel: [<c06de022>] schedule_timeout+0x52/0xd0 Oct 20 05:28:49 kattapa kernel: [<c035ec76>] dlm_thread+0x56/0xf60 Oct 20 05:28:49 kattapa kernel: [<c0133c60>] autoremove_wake_function+0x0/0x50 Oct 20 05:28:49 kattapa kernel: [<c035ec20>] dlm_thread+0x0/0xf60 Oct 20 05:28:49 kattapa kernel: [<c0133aab>] kthread+0xbb/0xf0 Oct 20 05:28:49 kattapa kernel: [<c01339f0>] kthread+0x0/0xf0 Oct 20 05:28:49 kattapa kernel: [<c0103693>] kernel_thread_helper+0x7/0x14 Oct 20 05:28:49 kattapa kernel: ======================Oct 20 05:28:49 kattapa kernel: Code: 89 74 24 0c 89 54 24 08 89 44 24 14 8b 81 a4 00 00 00 c7 04 24 1c 66 7c c0 89 44 24 04 e8 14 5f db ff 8b 44 24 28 e8 5b 2b ff ff <0f> 0b eb fe 8d b4 26 00 00 00 00 3d 00 f e ff ff 0f 84 dc fe ff Oct 20 05:28:49 kattapa kernel: EIP: [<c036b595>] dlm_drop_lockres_ref+0x1c5/0x280 SS:ESP 0068:f0473e60 Are the messages above able to tell what is wrong or are they just another result of the problem? best regards Henrik -- NOTE: Dear Outlook users: Please remove me from your address books. Read this article and you know why: http://newsforge.com/article.pl?sid=03/08/21/143258