Hello, today I noticed the following on *only* one node: ----- cut here ----- Apr 29 11:01:18 node06 kernel: [2569440.616036] INFO: task ocfs2_wq:5214 blocked for more than 120 seconds. Apr 29 11:01:18 node06 kernel: [2569440.616056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 29 11:01:18 node06 kernel: [2569440.616080] ocfs2_wq D 0000000000000002 0 5214 2 0x00000000 Apr 29 11:01:18 node06 kernel: [2569440.616101] ffff88014fa63880 0000000000000046 ffffffffa01878a5 ffffffffa020f0fc Apr 29 11:01:18 node06 kernel: [2569440.616131] 0000000000000000 000000000000f8a0 ffff88014baebfd8 00000000000155c0 Apr 29 11:01:18 node06 kernel: [2569440.616161] 00000000000155c0 ffff88014ca38e20 ffff88014ca39118 00000001a0187b86 Apr 29 11:01:18 node06 kernel: [2569440.616192] Call Trace: Apr 29 11:01:18 node06 kernel: [2569440.616223] [<ffffffffa01878a5>] ? scsi_done+0x0/0xc [scsi_mod] Apr 29 11:01:18 node06 kernel: [2569440.616245] [<ffffffffa020f0fc>] ? qla2xxx_queuecommand+0x171/0x1de [qla2xxx] Apr 29 11:01:18 node06 kernel: [2569440.616273] [<ffffffffa018d290>] ? scsi_request_fn+0x429/0x506 [scsi_mod] Apr 29 11:01:18 node06 kernel: [2569440.616291] [<ffffffffa02ab0a7>] ? o2dlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb] Apr 29 11:01:18 node06 kernel: [2569440.616317] [<ffffffffa02ab090>] ? o2dlm_lock_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb] Apr 29 11:01:18 node06 kernel: [2569440.616345] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd Apr 29 11:01:18 node06 kernel: [2569440.616362] [<ffffffff8118d99a>] ? vsnprintf+0x40a/0x449 Apr 29 11:01:18 node06 kernel: [2569440.616378] [<ffffffff812ee118>] ? wait_for_common+0xde/0x14f Apr 29 11:01:18 node06 kernel: [2569440.616396] [<ffffffff8104a188>] ? default_wake_function+0x0/0x9 Apr 29 11:01:18 node06 kernel: [2569440.616421] [<ffffffffa0fbac46>] ? __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616445] [<ffffffff812ee517>] ? out_of_line_wait_on_bit+0x6b/0x77 Apr 29 11:01:18 node06 kernel: [2569440.616468] [<ffffffffa0fbe8ff>] ? ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616497] [<ffffffffa0ffacc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616519] [<ffffffffa0ffacc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616540] [<ffffffffa0ffb3a3>] ? ocfs2_acquire_dquot+0x8d/0x105 [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616557] [<ffffffff812ee7b5>] ? mutex_lock+0xd/0x31 Apr 29 11:01:18 node06 kernel: [2569440.616574] [<ffffffff8112c2b2>] ? dqget+0x2ce/0x318 Apr 29 11:01:18 node06 kernel: [2569440.616589] [<ffffffff8112cbad>] ? dquot_initialize+0x51/0x115 Apr 29 11:01:18 node06 kernel: [2569440.616611] [<ffffffffa0fcaab8>] ? ocfs2_delete_inode+0x0/0x1640 [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616630] [<ffffffff810fee1f>] ? generic_delete_inode+0xd7/0x168 Apr 29 11:01:18 node06 kernel: [2569440.616652] [<ffffffffa0fca061>] ? ocfs2_drop_inode+0xc0/0x123 [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616669] [<ffffffff810fdfa8>] ? iput+0x27/0x60 Apr 29 11:01:18 node06 kernel: [2569440.616689] [<ffffffffa0fd0a8f>] ? ocfs2_complete_recovery+0x82b/0xa3f [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616715] [<ffffffff8106144b>] ? worker_thread+0x188/0x21d Apr 29 11:01:18 node06 kernel: [2569440.616736] [<ffffffffa0fd0264>] ? ocfs2_complete_recovery+0x0/0xa3f [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616761] [<ffffffff81064a36>] ? autoremove_wake_function+0x0/0x2e Apr 29 11:01:18 node06 kernel: [2569440.616778] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d Apr 29 11:01:18 node06 kernel: [2569440.616793] [<ffffffff81064769>] ? kthread+0x79/0x81 Apr 29 11:01:18 node06 kernel: [2569440.616810] [<ffffffff81011baa>] ? child_rip+0xa/0x20 Apr 29 11:01:18 node06 kernel: [2569440.616825] [<ffffffff810646f0>] ? kthread+0x0/0x81 Apr 29 11:01:18 node06 kernel: [2569440.616840] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 ----- cut here ----- On all the others I had the following: ----- cut here ----- Apr 29 11:00:23 node01 kernel: [2570880.752038] INFO: task o2quot/0:2971 blocked for more than 120 seconds. Apr 29 11:00:23 node01 kernel: [2570880.752059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 29 11:00:23 node01 kernel: [2570880.752083] o2quot/0 D 0000000000000000 0 2971 2 0x00000000 Apr 29 11:00:23 node01 kernel: [2570880.752104] ffffffff814451f0 0000000000000046 0000000000000000 0000000000000002 Apr 29 11:00:23 node01 kernel: [2570880.752134] ffff880249e28d20 000000000000f8a0 ffff88024cda3fd8 00000000000155c0 Apr 29 11:00:23 node01 kernel: [2570880.752164] 00000000000155c0 ffff88024ce4e9f0 ffff88024ce4ece8 000000004cda3a60 Apr 29 11:00:23 node01 kernel: [2570880.752195] Call Trace: Apr 29 11:00:23 node01 kernel: [2570880.752214] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd Apr 29 11:00:23 node01 kernel: [2570880.752233] [<ffffffff8110baff>] ? __find_get_block+0x176/0x186 Apr 29 11:00:23 node01 kernel: [2570880.752261] [<ffffffffa04fd29c>] ? ocfs2_validate_quota_block+0x0/0x88 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752286] [<ffffffff812ee118>] ? wait_for_common+0xde/0x14f Apr 29 11:00:23 node01 kernel: [2570880.752304] [<ffffffff8104a188>] ? default_wake_function+0x0/0x9 Apr 29 11:00:23 node01 kernel: [2570880.752326] [<ffffffffa04bbc46>] ? __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752351] [<ffffffff81044e0e>] ? find_busiest_group+0x3af/0x874 Apr 29 11:00:23 node01 kernel: [2570880.752373] [<ffffffffa04bf8ff>] ? ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752402] [<ffffffffa04fbcc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752424] [<ffffffffa04fbcc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752446] [<ffffffffa04fc8f8>] ? ocfs2_sync_dquot_helper+0xca/0x300 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752474] [<ffffffffa04fc82e>] ? ocfs2_sync_dquot_helper+0x0/0x300 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752500] [<ffffffff8112ce8e>] ? dquot_scan_active+0x78/0xd0 Apr 29 11:00:23 node01 kernel: [2570880.752521] [<ffffffffa04fbc2b>] ? qsync_work_fn+0x24/0x42 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752539] [<ffffffff8106144b>] ? worker_thread+0x188/0x21d Apr 29 11:00:23 node01 kernel: [2570880.752559] [<ffffffffa04fbc07>] ? qsync_work_fn+0x0/0x42 [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752576] [<ffffffff81064a36>] ? autoremove_wake_function+0x0/0x2e Apr 29 11:00:23 node01 kernel: [2570880.752593] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d Apr 29 11:00:23 node01 kernel: [2570880.752608] [<ffffffff81064769>] ? kthread+0x79/0x81 Apr 29 11:00:23 node01 kernel: [2570880.752625] [<ffffffff81011baa>] ? child_rip+0xa/0x20 Apr 29 11:00:23 node01 kernel: [2570880.752640] [<ffffffff810646f0>] ? kthread+0x0/0x81 Apr 29 11:00:23 node01 kernel: [2570880.752655] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 ----- cut here ----- By looking at the timestamps it seems that o2quot got stuck before ocfs2_wq, but right now I can't guarantee that they are 100% exact... Am I right if I think it has been a hardware failure? Best regards,
Cannot say for sure. It could be a deadlock (bug) too. As in, I don't want to blame any one entity without knowing more. If it were up to me, I'd start with the dlm. See which node holds the lock that others are waiting on. Then see why that node is unable to downconvert that lock. As in, if the lock has holders try to determine the pids holding that lock and see where they are stuck. In mainline you can do "cat /proc/PID/stack" to look at the stack of a PID. Marco wrote:> Hello, > > today I noticed the following on *only* one node: > > ----- cut here ----- > Apr 29 11:01:18 node06 kernel: [2569440.616036] INFO: task ocfs2_wq:5214 blocked for more than 120 seconds. > Apr 29 11:01:18 node06 kernel: [2569440.616056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Apr 29 11:01:18 node06 kernel: [2569440.616080] ocfs2_wq D 0000000000000002 0 5214 2 0x00000000 > Apr 29 11:01:18 node06 kernel: [2569440.616101] ffff88014fa63880 0000000000000046 ffffffffa01878a5 ffffffffa020f0fc > Apr 29 11:01:18 node06 kernel: [2569440.616131] 0000000000000000 000000000000f8a0 ffff88014baebfd8 00000000000155c0 > Apr 29 11:01:18 node06 kernel: [2569440.616161] 00000000000155c0 ffff88014ca38e20 ffff88014ca39118 00000001a0187b86 > Apr 29 11:01:18 node06 kernel: [2569440.616192] Call Trace: > Apr 29 11:01:18 node06 kernel: [2569440.616223] [<ffffffffa01878a5>] ? scsi_done+0x0/0xc [scsi_mod] > Apr 29 11:01:18 node06 kernel: [2569440.616245] [<ffffffffa020f0fc>] ? qla2xxx_queuecommand+0x171/0x1de [qla2xxx] > Apr 29 11:01:18 node06 kernel: [2569440.616273] [<ffffffffa018d290>] ? scsi_request_fn+0x429/0x506 [scsi_mod] > Apr 29 11:01:18 node06 kernel: [2569440.616291] [<ffffffffa02ab0a7>] ? o2dlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb] > Apr 29 11:01:18 node06 kernel: [2569440.616317] [<ffffffffa02ab090>] ? o2dlm_lock_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb] > Apr 29 11:01:18 node06 kernel: [2569440.616345] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd > Apr 29 11:01:18 node06 kernel: [2569440.616362] [<ffffffff8118d99a>] ? vsnprintf+0x40a/0x449 > Apr 29 11:01:18 node06 kernel: [2569440.616378] [<ffffffff812ee118>] ? wait_for_common+0xde/0x14f > Apr 29 11:01:18 node06 kernel: [2569440.616396] [<ffffffff8104a188>] ? default_wake_function+0x0/0x9 > Apr 29 11:01:18 node06 kernel: [2569440.616421] [<ffffffffa0fbac46>] ? __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616445] [<ffffffff812ee517>] ? out_of_line_wait_on_bit+0x6b/0x77 > Apr 29 11:01:18 node06 kernel: [2569440.616468] [<ffffffffa0fbe8ff>] ? ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616497] [<ffffffffa0ffacc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616519] [<ffffffffa0ffacc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616540] [<ffffffffa0ffb3a3>] ? ocfs2_acquire_dquot+0x8d/0x105 [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616557] [<ffffffff812ee7b5>] ? mutex_lock+0xd/0x31 > Apr 29 11:01:18 node06 kernel: [2569440.616574] [<ffffffff8112c2b2>] ? dqget+0x2ce/0x318 > Apr 29 11:01:18 node06 kernel: [2569440.616589] [<ffffffff8112cbad>] ? dquot_initialize+0x51/0x115 > Apr 29 11:01:18 node06 kernel: [2569440.616611] [<ffffffffa0fcaab8>] ? ocfs2_delete_inode+0x0/0x1640 [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616630] [<ffffffff810fee1f>] ? generic_delete_inode+0xd7/0x168 > Apr 29 11:01:18 node06 kernel: [2569440.616652] [<ffffffffa0fca061>] ? ocfs2_drop_inode+0xc0/0x123 [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616669] [<ffffffff810fdfa8>] ? iput+0x27/0x60 > Apr 29 11:01:18 node06 kernel: [2569440.616689] [<ffffffffa0fd0a8f>] ? ocfs2_complete_recovery+0x82b/0xa3f [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616715] [<ffffffff8106144b>] ? worker_thread+0x188/0x21d > Apr 29 11:01:18 node06 kernel: [2569440.616736] [<ffffffffa0fd0264>] ? ocfs2_complete_recovery+0x0/0xa3f [ocfs2] > Apr 29 11:01:18 node06 kernel: [2569440.616761] [<ffffffff81064a36>] ? autoremove_wake_function+0x0/0x2e > Apr 29 11:01:18 node06 kernel: [2569440.616778] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d > Apr 29 11:01:18 node06 kernel: [2569440.616793] [<ffffffff81064769>] ? kthread+0x79/0x81 > Apr 29 11:01:18 node06 kernel: [2569440.616810] [<ffffffff81011baa>] ? child_rip+0xa/0x20 > Apr 29 11:01:18 node06 kernel: [2569440.616825] [<ffffffff810646f0>] ? kthread+0x0/0x81 > Apr 29 11:01:18 node06 kernel: [2569440.616840] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 > ----- cut here ----- > > On all the others I had the following: > > ----- cut here ----- > Apr 29 11:00:23 node01 kernel: [2570880.752038] INFO: task o2quot/0:2971 blocked for more than 120 seconds. > Apr 29 11:00:23 node01 kernel: [2570880.752059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Apr 29 11:00:23 node01 kernel: [2570880.752083] o2quot/0 D 0000000000000000 0 2971 2 0x00000000 > Apr 29 11:00:23 node01 kernel: [2570880.752104] ffffffff814451f0 0000000000000046 0000000000000000 0000000000000002 > Apr 29 11:00:23 node01 kernel: [2570880.752134] ffff880249e28d20 000000000000f8a0 ffff88024cda3fd8 00000000000155c0 > Apr 29 11:00:23 node01 kernel: [2570880.752164] 00000000000155c0 ffff88024ce4e9f0 ffff88024ce4ece8 000000004cda3a60 > Apr 29 11:00:23 node01 kernel: [2570880.752195] Call Trace: > Apr 29 11:00:23 node01 kernel: [2570880.752214] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd > Apr 29 11:00:23 node01 kernel: [2570880.752233] [<ffffffff8110baff>] ? __find_get_block+0x176/0x186 > Apr 29 11:00:23 node01 kernel: [2570880.752261] [<ffffffffa04fd29c>] ? ocfs2_validate_quota_block+0x0/0x88 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752286] [<ffffffff812ee118>] ? wait_for_common+0xde/0x14f > Apr 29 11:00:23 node01 kernel: [2570880.752304] [<ffffffff8104a188>] ? default_wake_function+0x0/0x9 > Apr 29 11:00:23 node01 kernel: [2570880.752326] [<ffffffffa04bbc46>] ? __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752351] [<ffffffff81044e0e>] ? find_busiest_group+0x3af/0x874 > Apr 29 11:00:23 node01 kernel: [2570880.752373] [<ffffffffa04bf8ff>] ? ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752402] [<ffffffffa04fbcc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752424] [<ffffffffa04fbcc1>] ? ocfs2_lock_global_qf+0x28/0x81 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752446] [<ffffffffa04fc8f8>] ? ocfs2_sync_dquot_helper+0xca/0x300 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752474] [<ffffffffa04fc82e>] ? ocfs2_sync_dquot_helper+0x0/0x300 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752500] [<ffffffff8112ce8e>] ? dquot_scan_active+0x78/0xd0 > Apr 29 11:00:23 node01 kernel: [2570880.752521] [<ffffffffa04fbc2b>] ? qsync_work_fn+0x24/0x42 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752539] [<ffffffff8106144b>] ? worker_thread+0x188/0x21d > Apr 29 11:00:23 node01 kernel: [2570880.752559] [<ffffffffa04fbc07>] ? qsync_work_fn+0x0/0x42 [ocfs2] > Apr 29 11:00:23 node01 kernel: [2570880.752576] [<ffffffff81064a36>] ? autoremove_wake_function+0x0/0x2e > Apr 29 11:00:23 node01 kernel: [2570880.752593] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d > Apr 29 11:00:23 node01 kernel: [2570880.752608] [<ffffffff81064769>] ? kthread+0x79/0x81 > Apr 29 11:00:23 node01 kernel: [2570880.752625] [<ffffffff81011baa>] ? child_rip+0xa/0x20 > Apr 29 11:00:23 node01 kernel: [2570880.752640] [<ffffffff810646f0>] ? kthread+0x0/0x81 > Apr 29 11:00:23 node01 kernel: [2570880.752655] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 > ----- cut here ----- > > By looking at the timestamps it seems that o2quot got stuck before > ocfs2_wq, but right now I can't guarantee that they are 100% exact... > > Am I right if I think it has been a hardware failure? > > Best regards, > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
At my job we use QLA FC boards, and had some problems when migrated from FreeBSD to SLES (in mid 2007). We found two problems that time: 1. Old board firmware - fixed by patching it to the latest; 2. The stock SLES10 qla2xxx driver was outdated - fixed by downloading driver from vendor, compiling and installing it*; * SLES10SP3 has already the newer version as stock driver. SLES10, SLES10SP1 and SLES10SP2 has outdated drivers. Can't tell if you have the same problem as I can't remember the kernel error messages, but I remember that was very weird as sometimes works perfect and sometimes the board wasn't even detected. Can you tell more about your environment? Regards, S?rgio Em Thu, 29 Apr 2010 12:56:38 +0200 Marco <bozzolan at gmail.com> escreveu:> Hello, > > today I noticed the following on *only* one node: > > ----- cut here ----- > Apr 29 11:01:18 node06 kernel: [2569440.616036] INFO: task > ocfs2_wq:5214 blocked for more than 120 seconds. Apr 29 11:01:18 > node06 kernel: [2569440.616056] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr > > 29 11:01:18 node06 kernel: [2569440.616080] ocfs2_wq D > > 0000000000000002 0 5214 2 0x00000000 Apr 29 11:01:18 > > node06 kernel: [2569440.616101] ffff88014fa63880 0000000000000046 > > ffffffffa01878a5 ffffffffa020f0fc > Apr 29 11:01:18 node06 kernel: [2569440.616131] 0000000000000000 > 000000000000f8a0 ffff88014baebfd8 00000000000155c0 Apr 29 11:01:18 > node06 kernel: [2569440.616161] 00000000000155c0 ffff88014ca38e20 > ffff88014ca39118 00000001a0187b86 Apr 29 11:01:18 node06 kernel: > [2569440.616192] Call Trace: Apr 29 11:01:18 node06 kernel: > [2569440.616223] [<ffffffffa01878a5>] ? scsi_done+0x0/0xc [scsi_mod] > Apr 29 11:01:18 node06 kernel: [2569440.616245] > [<ffffffffa020f0fc>] ? qla2xxx_queuecommand+0x171/0x1de [qla2xxx] Apr > 29 11:01:18 node06 kernel: [2569440.616273] [<ffffffffa018d290>] ? > scsi_request_fn+0x429/0x506 [scsi_mod] Apr 29 11:01:18 node06 kernel: > [2569440.616291] [<ffffffffa02ab0a7>] ? > o2dlm_blocking_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb] Apr 29 > 11:01:18 node06 kernel: [2569440.616317] [<ffffffffa02ab090>] ? > o2dlm_lock_ast_wrapper+0x0/0x17 [ocfs2_stack_o2cb] Apr 29 11:01:18 > node06 kernel: [2569440.616345] [<ffffffff812ee253>] ? > schedule_timeout+0x2e/0xdd Apr 29 11:01:18 node06 kernel: > [2569440.616362] [<ffffffff8118d99a>] ? vsnprintf+0x40a/0x449 Apr 29 > 11:01:18 node06 kernel: [2569440.616378] [<ffffffff812ee118>] ? > wait_for_common+0xde/0x14f Apr 29 11:01:18 node06 kernel: > [2569440.616396] [<ffffffff8104a188>] ? > default_wake_function+0x0/0x9 Apr 29 11:01:18 node06 kernel: > [2569440.616421] [<ffffffffa0fbac46>] ? > __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2] Apr 29 11:01:18 node06 > kernel: [2569440.616445] [<ffffffff812ee517>] ? > out_of_line_wait_on_bit+0x6b/0x77 Apr 29 11:01:18 node06 kernel: > [2569440.616468] [<ffffffffa0fbe8ff>] ? > ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2] Apr 29 11:01:18 > node06 kernel: [2569440.616497] [<ffffffffa0ffacc1>] ? > ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:01:18 node06 kernel: > [2569440.616519] [<ffffffffa0ffacc1>] ? > ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:01:18 node06 kernel: > [2569440.616540] [<ffffffffa0ffb3a3>] ? > ocfs2_acquire_dquot+0x8d/0x105 [ocfs2] Apr 29 11:01:18 node06 kernel: > [2569440.616557] [<ffffffff812ee7b5>] ? mutex_lock+0xd/0x31 Apr 29 > 11:01:18 node06 kernel: [2569440.616574] [<ffffffff8112c2b2>] ? > dqget+0x2ce/0x318 Apr 29 11:01:18 node06 kernel: [2569440.616589] > [<ffffffff8112cbad>] ? dquot_initialize+0x51/0x115 Apr 29 11:01:18 > node06 kernel: [2569440.616611] [<ffffffffa0fcaab8>] ? > ocfs2_delete_inode+0x0/0x1640 [ocfs2] Apr 29 11:01:18 node06 kernel: > [2569440.616630] [<ffffffff810fee1f>] ? > generic_delete_inode+0xd7/0x168 Apr 29 11:01:18 node06 kernel: > [2569440.616652] [<ffffffffa0fca061>] ? ocfs2_drop_inode+0xc0/0x123 > [ocfs2] Apr 29 11:01:18 node06 kernel: [2569440.616669] > [<ffffffff810fdfa8>] ? iput+0x27/0x60 Apr 29 11:01:18 node06 kernel: > [2569440.616689] [<ffffffffa0fd0a8f>] ? > ocfs2_complete_recovery+0x82b/0xa3f [ocfs2] Apr 29 11:01:18 node06 > kernel: [2569440.616715] [<ffffffff8106144b>] ? > worker_thread+0x188/0x21d Apr 29 11:01:18 node06 kernel: > [2569440.616736] [<ffffffffa0fd0264>] ? > ocfs2_complete_recovery+0x0/0xa3f [ocfs2] Apr 29 11:01:18 node06 > kernel: [2569440.616761] [<ffffffff81064a36>] ? > autoremove_wake_function+0x0/0x2e Apr 29 11:01:18 node06 kernel: > [2569440.616778] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d Apr > 29 11:01:18 node06 kernel: [2569440.616793] [<ffffffff81064769>] ? > kthread+0x79/0x81 Apr 29 11:01:18 node06 kernel: [2569440.616810] > [<ffffffff81011baa>] ? child_rip+0xa/0x20 Apr 29 11:01:18 node06 > kernel: [2569440.616825] [<ffffffff810646f0>] ? kthread+0x0/0x81 Apr > 29 11:01:18 node06 kernel: [2569440.616840] [<ffffffff81011ba0>] ? > child_rip+0x0/0x20 ----- cut here ----- > > On all the others I had the following: > > ----- cut here ----- > Apr 29 11:00:23 node01 kernel: [2570880.752038] INFO: task > o2quot/0:2971 blocked for more than 120 seconds. Apr 29 11:00:23 > node01 kernel: [2570880.752059] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr > > 29 11:00:23 node01 kernel: [2570880.752083] o2quot/0 D > > 0000000000000000 0 2971 2 0x00000000 Apr 29 11:00:23 > > node01 kernel: [2570880.752104] ffffffff814451f0 0000000000000046 > > 0000000000000000 0000000000000002 > Apr 29 11:00:23 node01 kernel: [2570880.752134] ffff880249e28d20 > 000000000000f8a0 ffff88024cda3fd8 00000000000155c0 Apr 29 11:00:23 > node01 kernel: [2570880.752164] 00000000000155c0 ffff88024ce4e9f0 > ffff88024ce4ece8 000000004cda3a60 Apr 29 11:00:23 node01 kernel: > [2570880.752195] Call Trace: Apr 29 11:00:23 node01 kernel: > [2570880.752214] [<ffffffff812ee253>] ? schedule_timeout+0x2e/0xdd > Apr 29 11:00:23 node01 kernel: [2570880.752233] > [<ffffffff8110baff>] ? __find_get_block+0x176/0x186 Apr 29 11:00:23 > node01 kernel: [2570880.752261] [<ffffffffa04fd29c>] ? > ocfs2_validate_quota_block+0x0/0x88 [ocfs2] Apr 29 11:00:23 node01 > kernel: [2570880.752286] [<ffffffff812ee118>] ? > wait_for_common+0xde/0x14f Apr 29 11:00:23 node01 kernel: > [2570880.752304] [<ffffffff8104a188>] ? > default_wake_function+0x0/0x9 Apr 29 11:00:23 node01 kernel: > [2570880.752326] [<ffffffffa04bbc46>] ? > __ocfs2_cluster_lock+0x8a4/0x8c5 [ocfs2] Apr 29 11:00:23 node01 > kernel: [2570880.752351] [<ffffffff81044e0e>] ? > find_busiest_group+0x3af/0x874 Apr 29 11:00:23 node01 kernel: > [2570880.752373] [<ffffffffa04bf8ff>] ? > ocfs2_inode_lock_full_nested+0x1a3/0xb2c [ocfs2] Apr 29 11:00:23 > node01 kernel: [2570880.752402] [<ffffffffa04fbcc1>] ? > ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:00:23 node01 kernel: > [2570880.752424] [<ffffffffa04fbcc1>] ? > ocfs2_lock_global_qf+0x28/0x81 [ocfs2] Apr 29 11:00:23 node01 kernel: > [2570880.752446] [<ffffffffa04fc8f8>] ? > ocfs2_sync_dquot_helper+0xca/0x300 [ocfs2] Apr 29 11:00:23 node01 > kernel: [2570880.752474] [<ffffffffa04fc82e>] ? > ocfs2_sync_dquot_helper+0x0/0x300 [ocfs2] Apr 29 11:00:23 node01 > kernel: [2570880.752500] [<ffffffff8112ce8e>] ? > dquot_scan_active+0x78/0xd0 Apr 29 11:00:23 node01 kernel: > [2570880.752521] [<ffffffffa04fbc2b>] ? qsync_work_fn+0x24/0x42 > [ocfs2] Apr 29 11:00:23 node01 kernel: [2570880.752539] > [<ffffffff8106144b>] ? worker_thread+0x188/0x21d Apr 29 11:00:23 > node01 kernel: [2570880.752559] [<ffffffffa04fbc07>] ? > qsync_work_fn+0x0/0x42 [ocfs2] Apr 29 11:00:23 node01 kernel: > [2570880.752576] [<ffffffff81064a36>] ? > autoremove_wake_function+0x0/0x2e Apr 29 11:00:23 node01 kernel: > [2570880.752593] [<ffffffff810612c3>] ? worker_thread+0x0/0x21d Apr > 29 11:00:23 node01 kernel: [2570880.752608] [<ffffffff81064769>] ? > kthread+0x79/0x81 Apr 29 11:00:23 node01 kernel: [2570880.752625] > [<ffffffff81011baa>] ? child_rip+0xa/0x20 Apr 29 11:00:23 node01 > kernel: [2570880.752640] [<ffffffff810646f0>] ? kthread+0x0/0x81 Apr > 29 11:00:23 node01 kernel: [2570880.752655] [<ffffffff81011ba0>] ? > child_rip+0x0/0x20 ----- cut here ----- > > By looking at the timestamps it seems that o2quot got stuck before > ocfs2_wq, but right now I can't guarantee that they are 100% exact... > > Am I right if I think it has been a hardware failure? > > Best regards, > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users-- .:''''':. .:' ` S?rgio Surkamp | Gerente de Rede :: ........ sergio at gruposinternet.com.br `:. .:' `:, ,.:' *Grupos Internet S.A.* `: :' R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florian?polis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br