Hello, We have got a problem this morning with our cluster. Cluster setup: Servers: * Two R800 Dell servers running CentOS 5.5 and ULEK 2.6.32-100.0.19.el5, with 8G ram each; * OCFS2 1.6.4; * iSCSI connection using two bonded Gbit nics. Storage: * Dell EqualLogic 4000VX -- iSCSI Network: * Two dell 1Gbit trunked switches; Problem description: The node #1 has hanged access to the filesystem and the hung tasks has almost the same stack trace as one of the following: ------------------------------------------------------------------------------- INFO: task maildirsize:17252 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. maildirsize D 0000000000000004 0 17252 17249 0x00000080 ffff8800b181d7f8 0000000000000086 0000000000000000 ffff880098de3c40 ffff88001293a1c0 ffff88022bb7c4c0 ffff88001293a598 ffffffffa036a654 ffff8800b181d7e8 ffffffff81043d10 ffff88001293a1f8 7fffffffffffffff Call Trace: [<ffffffffa036a654>] ? dlmlock+0x12e2/0x13bb [ocfs2_dlm] [<ffffffff81043d10>] ? update_curr+0xc9/0xd2 [<ffffffff8143798a>] schedule_timeout+0x36/0xe7 [<ffffffff810425b3>] ? need_resched+0x23/0x2d [<ffffffff814377fc>] wait_for_common+0xb7/0x12c [<ffffffff8104bc15>] ? default_wake_function+0x0/0x19 [<ffffffffa03b07fe>] ? lockres_clear_flags+0x15/0x17 [ocfs2] [<ffffffff81437914>] wait_for_completion+0x1d/0x1f [<ffffffffa03b15ad>] ocfs2_wait_for_mask+0x1a/0x29 [ocfs2] [<ffffffffa03b1f59>] __ocfs2_cluster_lock+0x83c/0x861 [ocfs2] [<ffffffffa03c6319>] ? ocfs2_inode_cache_io_unlock+0x12/0x14 [ocfs2] [<ffffffffa03f48f5>] ? ocfs2_metadata_cache_io_unlock+0x1e/0x20 [ocfs2] [<ffffffffa03c344c>] ? ocfs2_validate_inode_block+0x0/0x1cd [ocfs2] [<ffffffffa03c3242>] ? ocfs2_read_inode_block_full+0x3e/0x5a [ocfs2] [<ffffffffa03b38dc>] ocfs2_inode_lock_full_nested+0x194/0xb8d [ocfs2] [<ffffffffa03d3f8c>] ? ocfs2_rename+0x49e/0x183d [ocfs2] [<ffffffffa03c344c>] ? ocfs2_validate_inode_block+0x0/0x1cd [ocfs2] [<ffffffffa03d3f8c>] ocfs2_rename+0x49e/0x183d [ocfs2] [<ffffffffa03a6792>] ? brelse+0x13/0x15 [ocfs2] [<ffffffffa03b0692>] ? init_completion+0x1f/0x21 [ocfs2] [<ffffffffa03b0741>] ? ocfs2_init_mask_waiter+0x26/0x3f [ocfs2] [<ffffffffa03b0692>] ? init_completion+0x1f/0x21 [ocfs2] [<ffffffffa03b202c>] ? ocfs2_should_refresh_lock_res+0x8f/0x1ad [ocfs2] [<ffffffff810425b3>] ? need_resched+0x23/0x2d [<ffffffff810e8512>] ? kstrdup+0x2b/0xc0 [<ffffffff811229eb>] vfs_rename+0x221/0x3c0 [<ffffffff81124968>] sys_renameat+0x18b/0x201 [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d [<ffffffff811178fc>] ? fsnotify_modify+0x6c/0x74 [<ffffffff8112186b>] ? path_put+0x22/0x27 [<ffffffff811249f9>] sys_rename+0x1b/0x1d [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b INFO: task imapd:17386 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. imapd D 000000000000000b 0 17386 4367 0x00000080 ffff880208709c08 0000000000000086 0000000000000000 0000000000000286 ffff8801501ac800 ffff88012b008180 ffff8801501acbd8 00000001bbc4c49f 0000000000000000 0000000000000000 0000000000000000 ffff880127bf8d9c Call Trace: [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1 [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b [<ffffffff81437f5b>] mutex_lock+0x23/0x3a [<ffffffff81121aaa>] do_lookup+0x85/0x162 [<ffffffff8112445c>] __link_path_walk+0x49e/0x5fb [<ffffffff81238204>] ? __strncpy_from_user+0x31/0x4a [<ffffffff8112460c>] path_walk+0x53/0x9c [<ffffffff81124727>] do_path_lookup+0x2f/0x7a [<ffffffff8112514f>] user_path_at+0x57/0x91 [<ffffffff810f0001>] ? handle_mm_fault+0x14b/0x7d9 [<ffffffff8111c366>] vfs_fstatat+0x3a/0x67 [<ffffffffa03b0567>] ? ocfs2_inode_unlock+0x140/0x1a5 [ocfs2] [<ffffffff8111c479>] vfs_stat+0x1b/0x1d [<ffffffff8111c49a>] sys_newstat+0x1f/0x39 [<ffffffff8143b1e3>] ? do_page_fault+0x25d/0x26c [<ffffffff810a7d03>] ? audit_syscall_entry+0x103/0x12f [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b ------------------------------------------------------------------------------- When we rebooted the node#1, the fallowing recovery messages was logged by node#0: ------------------------------------------------------------------------------- o2net: connection to node XXXXXXXXXX (num 1) at ip.ip.ip.2:7777 has been idle for 60.0 seconds, shutting it down. (swapper,0,0):o2net_idle_timer:1498 here are some times that might help debug the situation: (tmr 1314962116.650772 now 1314962176.650058 dr 1314962116.650749 adv 1314962116.650781:1314962116.650782 func (3f8ab666:504) 1314962114.651682:1314962114.651687) o2net: no longer connected to node XXXXXXXXXX (num 1) at ip.ip.ip.2:7777 (dlm_thread,4143,2):dlm_send_proxy_ast_msg:456 ERROR: Error -112 when sending message 505 (key 0x3f8ab666) to node 1 (dlm_thread,4143,2):dlm_flush_asts:599 ERROR: status = -112 (o2net,4052,0):o2net_connect_expired:1659 ERROR: no connection established with node 1 after 60.0 seconds, giving up and returning errors. (o2hb-E4FCE3DA14,4121,1):o2dlm_eviction_cb:267 o2dlm has evicted node 1 from group E4FCE3DA14E94515B327E8558F4641DA (ocfs2rec,13693,10):ocfs2_replay_journal:1605 Recovering node 1 from slot 1 on device (8,16) (dlm_reco_thread,4144,0):dlm_get_lock_resource:836 E4FCE3DA14E94515B327E8558F4641DA:$RECOVERY: at least one node (1) to recover before lock mastery can begin (dlm_reco_thread,4144,0):dlm_get_lock_resource:870 E4FCE3DA14E94515B327E8558F4641DA: recovery map is not empty, but must master $RECOVERY lock now (dlm_reco_thread,4144,0):dlm_do_recovery:523 (4144) Node 0 is the Recovery Master for the Dead Node 1 for Domain E4FCE3DA14E94515B327E8558F4641DA (ocfs2rec,13693,7):ocfs2_begin_quota_recovery:407 Beginning quota recovery in slot 1 (ocfs2_wq,4124,1):ocfs2_finish_quota_recovery:598 Finishing quota recovery in slot 1 ------------------------------------------------------------------------------- The processes hung at node#1 was more than 1 hour waiting for IO, so it seems to me that node#0 never replied the dlmlock shown on the first stack trace of node#1 process, but the message was queued to be sent, as show on node#0 dmesg when node#1 rebooted. The question is what could prevent dlm on node#0 replying the locking request from node#1? We have made copies of all debug files from /sys/kernel/debug on node#1 before rebooting it and we can provide them if needed (9.3MiB compressed data). Regards, -- .:''''':. .:' ` S?rgio Surkamp | Gerente de Rede :: ........ sergio at gruposinternet.com.br `:. .:' `:, ,.:' *Grupos Internet S.A.* `: :' R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florian?polis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br
Log what you have in a bz. I can take a look. I doubt you will be able to attach that file though. You'll need to provide me with a link. On 09/02/2011 07:28 AM, S?rgio Surkamp wrote:> Hello, > > We have got a problem this morning with our cluster. > > Cluster setup: > > Servers: > * Two R800 Dell servers running CentOS 5.5 and ULEK > 2.6.32-100.0.19.el5, with 8G ram each; > * OCFS2 1.6.4; > * iSCSI connection using two bonded Gbit nics. > > Storage: > * Dell EqualLogic 4000VX -- iSCSI > > Network: > * Two dell 1Gbit trunked switches; > > Problem description: > > The node #1 has hanged access to the filesystem and the hung tasks has > almost the same stack trace as one of the following: > > ------------------------------------------------------------------------------- > INFO: task maildirsize:17252 blocked for more than 120 seconds. > "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > maildirsize D 0000000000000004 0 17252 17249 0x00000080 > ffff8800b181d7f8 0000000000000086 0000000000000000 ffff880098de3c40 > ffff88001293a1c0 ffff88022bb7c4c0 ffff88001293a598 ffffffffa036a654 > ffff8800b181d7e8 ffffffff81043d10 ffff88001293a1f8 7fffffffffffffff > Call Trace: > [<ffffffffa036a654>] ? dlmlock+0x12e2/0x13bb [ocfs2_dlm] > [<ffffffff81043d10>] ? update_curr+0xc9/0xd2 > [<ffffffff8143798a>] schedule_timeout+0x36/0xe7 > [<ffffffff810425b3>] ? need_resched+0x23/0x2d > [<ffffffff814377fc>] wait_for_common+0xb7/0x12c > [<ffffffff8104bc15>] ? default_wake_function+0x0/0x19 > [<ffffffffa03b07fe>] ? lockres_clear_flags+0x15/0x17 [ocfs2] > [<ffffffff81437914>] wait_for_completion+0x1d/0x1f > [<ffffffffa03b15ad>] ocfs2_wait_for_mask+0x1a/0x29 [ocfs2] > [<ffffffffa03b1f59>] __ocfs2_cluster_lock+0x83c/0x861 [ocfs2] > [<ffffffffa03c6319>] ? ocfs2_inode_cache_io_unlock+0x12/0x14 [ocfs2] > [<ffffffffa03f48f5>] ? ocfs2_metadata_cache_io_unlock+0x1e/0x20 [ocfs2] > [<ffffffffa03c344c>] ? ocfs2_validate_inode_block+0x0/0x1cd [ocfs2] > [<ffffffffa03c3242>] ? ocfs2_read_inode_block_full+0x3e/0x5a [ocfs2] > [<ffffffffa03b38dc>] ocfs2_inode_lock_full_nested+0x194/0xb8d [ocfs2] > [<ffffffffa03d3f8c>] ? ocfs2_rename+0x49e/0x183d [ocfs2] > [<ffffffffa03c344c>] ? ocfs2_validate_inode_block+0x0/0x1cd [ocfs2] > [<ffffffffa03d3f8c>] ocfs2_rename+0x49e/0x183d [ocfs2] > [<ffffffffa03a6792>] ? brelse+0x13/0x15 [ocfs2] > [<ffffffffa03b0692>] ? init_completion+0x1f/0x21 [ocfs2] > [<ffffffffa03b0741>] ? ocfs2_init_mask_waiter+0x26/0x3f [ocfs2] > [<ffffffffa03b0692>] ? init_completion+0x1f/0x21 [ocfs2] > [<ffffffffa03b202c>] ? ocfs2_should_refresh_lock_res+0x8f/0x1ad [ocfs2] > [<ffffffff810425b3>] ? need_resched+0x23/0x2d > [<ffffffff810e8512>] ? kstrdup+0x2b/0xc0 > [<ffffffff811229eb>] vfs_rename+0x221/0x3c0 > [<ffffffff81124968>] sys_renameat+0x18b/0x201 > [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d > [<ffffffff811178fc>] ? fsnotify_modify+0x6c/0x74 > [<ffffffff8112186b>] ? path_put+0x22/0x27 > [<ffffffff811249f9>] sys_rename+0x1b/0x1d > [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b > > INFO: task imapd:17386 blocked for more than 120 seconds. > "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > imapd D 000000000000000b 0 17386 4367 0x00000080 > ffff880208709c08 0000000000000086 0000000000000000 0000000000000286 > ffff8801501ac800 ffff88012b008180 ffff8801501acbd8 00000001bbc4c49f > 0000000000000000 0000000000000000 0000000000000000 ffff880127bf8d9c > Call Trace: > [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1 > [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b > [<ffffffff81437f5b>] mutex_lock+0x23/0x3a > [<ffffffff81121aaa>] do_lookup+0x85/0x162 > [<ffffffff8112445c>] __link_path_walk+0x49e/0x5fb > [<ffffffff81238204>] ? __strncpy_from_user+0x31/0x4a > [<ffffffff8112460c>] path_walk+0x53/0x9c > [<ffffffff81124727>] do_path_lookup+0x2f/0x7a > [<ffffffff8112514f>] user_path_at+0x57/0x91 > [<ffffffff810f0001>] ? handle_mm_fault+0x14b/0x7d9 > [<ffffffff8111c366>] vfs_fstatat+0x3a/0x67 > [<ffffffffa03b0567>] ? ocfs2_inode_unlock+0x140/0x1a5 [ocfs2] > [<ffffffff8111c479>] vfs_stat+0x1b/0x1d > [<ffffffff8111c49a>] sys_newstat+0x1f/0x39 > [<ffffffff8143b1e3>] ? do_page_fault+0x25d/0x26c > [<ffffffff810a7d03>] ? audit_syscall_entry+0x103/0x12f > [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b > ------------------------------------------------------------------------------- > > When we rebooted the node#1, the fallowing recovery messages was > logged by node#0: > > ------------------------------------------------------------------------------- > o2net: connection to node XXXXXXXXXX (num 1) at ip.ip.ip.2:7777 has > been idle for 60.0 seconds, shutting it down. > (swapper,0,0):o2net_idle_timer:1498 here are some times that might help > debug the situation: (tmr 1314962116.650772 now 1314962176.650058 dr > 1314962116.650749 adv 1314962116.650781:1314962116.650782 func > (3f8ab666:504) 1314962114.651682:1314962114.651687) > o2net: no longer connected to node XXXXXXXXXX (num 1) at > ip.ip.ip.2:7777 > (dlm_thread,4143,2):dlm_send_proxy_ast_msg:456 ERROR: Error -112 when > sending message 505 (key 0x3f8ab666) to node 1 > (dlm_thread,4143,2):dlm_flush_asts:599 ERROR: status = -112 > (o2net,4052,0):o2net_connect_expired:1659 ERROR: no connection > established with node 1 after 60.0 seconds, giving up and returning > errors. > (o2hb-E4FCE3DA14,4121,1):o2dlm_eviction_cb:267 o2dlm has evicted node 1 > from group E4FCE3DA14E94515B327E8558F4641DA > (ocfs2rec,13693,10):ocfs2_replay_journal:1605 Recovering node 1 from > slot 1 on device (8,16) > (dlm_reco_thread,4144,0):dlm_get_lock_resource:836 > E4FCE3DA14E94515B327E8558F4641DA:$RECOVERY: at least one node (1) to > recover before lock mastery can begin > (dlm_reco_thread,4144,0):dlm_get_lock_resource:870 > E4FCE3DA14E94515B327E8558F4641DA: recovery map is not empty, but must > master $RECOVERY lock now > (dlm_reco_thread,4144,0):dlm_do_recovery:523 (4144) Node 0 is the > Recovery Master for the Dead Node 1 for Domain > E4FCE3DA14E94515B327E8558F4641DA > (ocfs2rec,13693,7):ocfs2_begin_quota_recovery:407 Beginning quota > recovery in slot 1 > (ocfs2_wq,4124,1):ocfs2_finish_quota_recovery:598 Finishing quota > recovery in slot 1 > ------------------------------------------------------------------------------- > > The processes hung at node#1 was more than 1 hour waiting for IO, so it > seems to me that node#0 never replied the dlmlock shown on the first > stack trace of node#1 process, but the message was queued to be sent, > as show on node#0 dmesg when node#1 rebooted. The question is what > could prevent dlm on node#0 replying the locking request from node#1? > > We have made copies of all debug files from /sys/kernel/debug on node#1 > before rebooting it and we can provide them if needed (9.3MiB > compressed data). > > Regards,