Guozhonghua
2012-Jun-21 03:46 UTC
[Ocfs2-devel] echo 0 > /proc/sys/kernel/hung_task_timeout_secs and others error, Part II
The first problem is as below: One issue is the files copied to the device but it can't be list on node2, using ls -al the mounted directory. But using debug.ocfs2 on node2, it is ok to list the files copied. After remount of the device on node2, the file can be list. The second is that: Node1 is in the ocfs2 cluster, but using debug.ocfs2, and mounted.ocfs2 -f command, can not list the node1 info. The node2, node3 are list. And using debug.ocfs2, list the slotmap information, there is not node1. But the heartbeat information on disk is ok. Ant there are lot of "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" error message in the log. We format the device with 32 node using the command: mkfs.ocfs2 -b 4k -C 1M -L target100 -T vmstore -N 32 /dev/sdb So we have to delete the ocfs2 cluster, reboot nodes, and rebuild the ocfs2. After all node joins into the cluster, we copy data again, and there are "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" message still. Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.006781] INFO: task cp:22285 blocked for more than 120 seconds. Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.016123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034724] cp D ffffffff81806240 0 22285 5313 0x00000000 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034729] ffff881b952658b0 0000000000000082 0000000000000000 0000000000000001 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034739] ffff881b95265fd8 ffff881b95265fd8 ffff881b95265fd8 0000000000013780 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034751] ffff880fc16044d0 ffff881fbe41ade0 ffff882027c13780 ffff881fbe41ade0 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034762] Call Trace: Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034769] [<ffffffff8165a55f>] schedule+0x3f/0x60 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034777] [<ffffffff8165c35d>] rwsem_down_failed_common+0xcd/0x170 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034808] [<ffffffffa059d399>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034815] [<ffffffff8165c435>] rwsem_down_read_failed+0x15/0x17 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034826] [<ffffffff813188d4>] call_rwsem_down_read_failed+0x14/0x30 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034833] [<ffffffff8165b754>] ? down_read+0x24/0x2b Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034859] [<ffffffffa0553b11>] ocfs2_start_trans+0xe1/0x1e0 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034878] [<ffffffffa052ab35>] ocfs2_write_begin_nolock+0x945/0x1c40 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034903] [<ffffffffa054cb90>] ? ocfs2_inode_is_valid_to_delete+0x1f0/0x1f0 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034927] [<ffffffffa053fa9c>] ? ocfs2_inode_lock_full_nested+0x52c/0xa90 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034939] [<ffffffff81647ae2>] ? balance_dirty_pages.isra.17+0x457/0x4ba Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034959] [<ffffffffa052bf26>] ocfs2_write_begin+0xf6/0x210 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034968] [<ffffffff8111752a>] generic_perform_write+0xca/0x210 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034991] [<ffffffffa053d9b9>] ? ocfs2_inode_unlock+0xb9/0x130 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034998] [<ffffffff811176cd>] generic_file_buffered_write+0x5d/0x90 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035023] [<ffffffffa054c601>] ocfs2_file_aio_write+0x821/0x870 [ocfs2] Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035032] [<ffffffff81177342>] do_sync_write+0xd2/0x110 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035043] [<ffffffff812d7448>] ? apparmor_file_permission+0x18/0x20 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035052] [<ffffffff8129cc9c>] ? security_file_permission+0x2c/0xb0 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035058] [<ffffffff811778d1>] ? rw_verify_area+0x61/0xf0 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035064] [<ffffffff81177c33>] vfs_write+0xb3/0x180 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035070] [<ffffffff81177f5a>] sys_write+0x4a/0x90 Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035077] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b Is there some better advice or practice? Or is there some bug? The information of the OS is as below and all the four node are installed same. : 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux The host information as below: # free total used free shared buffers cached Mem: 132028152 104355680 27672472 0 171496 69113032 -/+ buffers/cache: 35071152 96957000 Swap: 34523132 0 34523132 Cpu information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 44 Stepping: 2 CPU MHz: 2532.792 BogoMIPS: 5065.22 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23 Thanks ------------------------------------------------------------------------------------------------------------------------------------- ???????????????????????????????????????? ???????????????????????????????????????? ???????????????????????????????????????? ??? This e-mail and its attachments contain confidential information from H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20120621/1f8b51c4/attachment-0001.html
Joel Becker
2012-Jun-21 04:54 UTC
[Ocfs2-users] [Ocfs2-devel] echo 0 > /proc/sys/kernel/hung_task_timeout_secs and others error, Part II
On Thu, Jun 21, 2012 at 03:46:32AM +0000, Guozhonghua wrote:> The first problem is as below: > One issue is the files copied to the device but it can't be list on node2, using ls -al the mounted directory. > But using debug.ocfs2 on node2, it is ok to list the files copied. After remount of the device on node2, the file can be list.This is the kind of thing you see when locking gets unhappy. You copy on node1, it writes to the disk, but somehow node2 has not noticed. Thus, you can see the data on disk (debugfs.ocfs2), but not via the filesystme. What kind of storage is this? How are node1, node2, and node3 attached to it? How do they talk to each other?> The second is that: > Node1 is in the ocfs2 cluster, but using debug.ocfs2, and mounted.ocfs2 -f command, can not list the node1 info. > The node2, node3 are list. And using debug.ocfs2, list the slotmap information, there is not node1.This is very interesting. Joel> But the heartbeat information on disk is ok. > > Ant there are lot of "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" error message in the log. > > We format the device with 32 node using the command: > mkfs.ocfs2 -b 4k -C 1M -L target100 -T vmstore -N 32 /dev/sdb > > So we have to delete the ocfs2 cluster, reboot nodes, and rebuild the ocfs2. > After all node joins into the cluster, we copy data again, and there are "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" message still. > > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.006781] INFO: task cp:22285 blocked for more than 120 seconds. > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.016123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034724] cp D ffffffff81806240 0 22285 5313 0x00000000 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034729] ffff881b952658b0 0000000000000082 0000000000000000 0000000000000001 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034739] ffff881b95265fd8 ffff881b95265fd8 ffff881b95265fd8 0000000000013780 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034751] ffff880fc16044d0 ffff881fbe41ade0 ffff882027c13780 ffff881fbe41ade0 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034762] Call Trace: > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034769] [<ffffffff8165a55f>] schedule+0x3f/0x60 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034777] [<ffffffff8165c35d>] rwsem_down_failed_common+0xcd/0x170 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034808] [<ffffffffa059d399>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034815] [<ffffffff8165c435>] rwsem_down_read_failed+0x15/0x17 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034826] [<ffffffff813188d4>] call_rwsem_down_read_failed+0x14/0x30 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034833] [<ffffffff8165b754>] ? down_read+0x24/0x2b > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034859] [<ffffffffa0553b11>] ocfs2_start_trans+0xe1/0x1e0 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034878] [<ffffffffa052ab35>] ocfs2_write_begin_nolock+0x945/0x1c40 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034903] [<ffffffffa054cb90>] ? ocfs2_inode_is_valid_to_delete+0x1f0/0x1f0 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034927] [<ffffffffa053fa9c>] ? ocfs2_inode_lock_full_nested+0x52c/0xa90 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034939] [<ffffffff81647ae2>] ? balance_dirty_pages.isra.17+0x457/0x4ba > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034959] [<ffffffffa052bf26>] ocfs2_write_begin+0xf6/0x210 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034968] [<ffffffff8111752a>] generic_perform_write+0xca/0x210 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034991] [<ffffffffa053d9b9>] ? ocfs2_inode_unlock+0xb9/0x130 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034998] [<ffffffff811176cd>] generic_file_buffered_write+0x5d/0x90 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035023] [<ffffffffa054c601>] ocfs2_file_aio_write+0x821/0x870 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035032] [<ffffffff81177342>] do_sync_write+0xd2/0x110 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035043] [<ffffffff812d7448>] ? apparmor_file_permission+0x18/0x20 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035052] [<ffffffff8129cc9c>] ? security_file_permission+0x2c/0xb0 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035058] [<ffffffff811778d1>] ? rw_verify_area+0x61/0xf0 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035064] [<ffffffff81177c33>] vfs_write+0xb3/0x180 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035070] [<ffffffff81177f5a>] sys_write+0x4a/0x90 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035077] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b > > Is there some better advice or practice? Or is there some bug? > > The information of the OS is as below and all the four node are installed same. : > 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > The host information as below: > # free > total used free shared buffers cached > Mem: 132028152 104355680 27672472 0 171496 69113032 > -/+ buffers/cache: 35071152 96957000 > Swap: 34523132 0 34523132 > > Cpu information: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 24 > On-line CPU(s) list: 0-23 > Thread(s) per core: 2 > Core(s) per socket: 6 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 44 > Stepping: 2 > CPU MHz: 2532.792 > BogoMIPS: 5065.22 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 12288K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23 > > > Thanks > > > ------------------------------------------------------------------------------------------------------------------------------------- > ???????????????????????????????????????? > ???????????????????????????????????????? > ???????????????????????????????????????? > ??? > This e-mail and its attachments contain confidential information from H3C, which is > intended only for the person or entity whose address is listed above. Any use of the > information contained herein in any way (including, but not limited to, total or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it!> _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-devel-- "People with narrow minds usually have broad tongues." http://www.jlbec.org/ jlbec at evilplan.org