Jeremy Schneider
2003-Dec-10 12:44 UTC
[Ocfs-users] [BUG] node 0 hangs until disk unmounted on node 1
I'm currently part of a project implementing Oracle eBusiness Suite 11i on RAC. We're using a two-node cluster with shared storage, both nodes are configured identical. Kernel is 2.4.9-e.27enterprise and ocfs is 1.0.9-11. I have checked and the shared storage can be accessed directly without any problems from both nodes (/dev/sdx). Curious if anyone has any suggestions or comments regarding a problem we've been having. After mounting the ocfs partitions, eventually one of the nodes will hang: the oracle server processes will get stuck in a "D" Disk wait state, and when I go into the folder with the datafiles and type "ls" that process also hangs in a "D" state. It's interesting that I can list the contents of other folders, but as soon as I try to list the contents of the directory with the datafiles, the process hangs in a Disk Wait state. "strace -p" also hangs when I try to run it on the process. This only happens when the volume is mounted on both nodes. Last Friday I had several oracle processes hung and several terminal windows with hung /bin/ls processes. The *moment* I unmounted /u02 from the other node, *all* of the "D" processes (even strace) instantly came out of Disk Wait and continued. Seems like an ocfs issue to me. Any ideas how I can further narrow this problem down? Would a process dump from the magic sysrq key (t) help? Or the wait channel from a "ps l"? Is one node designated a "master" node, and would identifying whether or not this happened on the master node help? Regards, Jeremy Jeremy Schneider Systems/Database Administrator The ASU Group - IS Dept email: jer1887@asugroup.com Life is either a daring adventure or nothing. -- Helen Keller, Let Us Have Faith
Rui Amaral
2003-Dec-10 13:02 UTC
[Ocfs-users] [BUG] node 0 hangs until disk unmounted on node 1
I had something along those lines too and worked it out with an HP engineer. We were HP XP512 as the shared disk with dual channels to the disk. The nodes hung because we had the disk mounted using the secondary channel as opposed to the primary channel (something about cluster filesystems not liking a secondary channel to shared disk - nto exactly clear on the details though). Once I mounted the disk using the primary channel the problem went away. Maybe you are having this issue? Just something I have encountered. HOH -----Original Message----- From: Jeremy Schneider [mailto:jer1887@asugroup.com] Sent: December 10, 2003 1:44 PM To: [; [ Subject: [Ocfs-users] [BUG] node 0 hangs until disk unmounted on node 1 I'm currently part of a project implementing Oracle eBusiness Suite 11i on RAC. We're using a two-node cluster with shared storage, both nodes are configured identical. Kernel is 2.4.9-e.27enterprise and ocfs is 1.0.9-11. I have checked and the shared storage can be accessed directly without any problems from both nodes (/dev/sdx). Curious if anyone has any suggestions or comments regarding a problem we've been having. After mounting the ocfs partitions, eventually one of the nodes will hang: the oracle server processes will get stuck in a "D" Disk wait state, and when I go into the folder with the datafiles and type "ls" that process also hangs in a "D" state. It's interesting that I can list the contents of other folders, but as soon as I try to list the contents of the directory with the datafiles, the process hangs in a Disk Wait state. "strace -p" also hangs when I try to run it on the process. This only happens when the volume is mounted on both nodes. Last Friday I had several oracle processes hung and several terminal windows with hung /bin/ls processes. The *moment* I unmounted /u02 from the other node, *all* of the "D" processes (even strace) instantly came out of Disk Wait and continued. Seems like an ocfs issue to me. Any ideas how I can further narrow this problem down? Would a process dump from the magic sysrq key (t) help? Or the wait channel from a "ps l"? Is one node designated a "master" node, and would identifying whether or not this happened on the master node help? Regards, Jeremy Jeremy Schneider Systems/Database Administrator The ASU Group - IS Dept email: jer1887@asugroup.com Life is either a daring adventure or nothing. -- Helen Keller, Let Us Have Faith _______________________________________________ Ocfs-users mailing list Ocfs-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs-users