Hi List, I have a problem with my OCFS2 volume. I have two web servers connected to an EqualLogic shared storage via Open-iSCSI. This web servers are hosting our web presence (Typo3 CMS) and some forums (phpBB 3). We encountered two cases when OCFS2 cluster is hanging: 1. CPU usage on one node increases to 100% because of high MySQL usage or several "convert" processes 2. iSCSI storage is unreachable for some reason (maybe switch hardware error, power loss, ...) In case 1 both nodes seem to be unable to release locks. When I run this command "ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D" I get a lot of httpd2-prefork processes and some convert processes with status "D" on both nodes. Some of them have "ocfs2_lock" in wchan column. I am unable to kill any of this processes the processes. Kill <processed> and killall -9 httd2-prefork also does not kill the processes. I had to turn off both nodes. Rebooting is not possible in this case. In case 2 both node are available and idle. All interfaces including heartbeat are up and running. Then, when I pull iSCSI network cable from one node, the node is fence itself and reboots after 1 minute. This works as expected. But the other node is not deploying web site until the "failure" node was rebooted. In both cases file system is read-only on at least one node. Rebooting both machines does not bring the volume back to read-write. I have to unmount and mount the volume manually. I have done some tests with only one running node. The behavior is the same as described in case 1 and file system is read-only. Have somebody a hint what I can do to prevent cluster from hang when heartbeat network is up, but one node have lost connection to OCFS2 volume? Is there a way to prevent file system from going into read-only mode? Thanks for any help. Best regards, Christian EPLAN Software & Service GmbH & Co. KG - An der alten Ziegelei 2 - D-40789 Monheim Phone +49 (02173) 3964-190 - Fax +49 (02173) 3964-40190 mailto: Loebbert.C at eplan.de http://www.eplan.de - Friedhelm-Loh-Group: www.friedhelm-loh-group.com Sitz: Monheim, Amtsgericht D?sseldorf, HRA 16335 - Pers?nlich haftende Gesellschafterin: FL Software GmbH, Sitz: Haiger, Amtsgericht Wetzlar, HRB 3513 - Gesch?ftsf?hrung: Norbert M?ller (Vorsitzender), Hans H?ssig Diese E-Mail ist vertraulich. Eine Weitergabe der darin verk?rperten Informationen ist ohne Zustimmung des Versenders unzul?ssig.*** This email contains confidential information. You are not authorised to copy the contents without the consent of the sender.*** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100215/06821ace/attachment.html
Please remove me from this email list. ________________________________ From: ocfs2-users-bounces at oss.oracle.com on behalf of Loebbert.C at eplan.de Sent: Mon 2/15/2010 9:45 AM To: ocfs2-users at oss.oracle.com Subject: [Ocfs2-users] problems with OCFS2 locking Hi List, I have a problem with my OCFS2 volume. I have two web servers connected to an EqualLogic shared storage via Open-iSCSI. This web servers are hosting our web presence (Typo3 CMS) and some forums (phpBB 3). We encountered two cases when OCFS2 cluster is hanging: 1. CPU usage on one node increases to 100% because of high MySQL usage or several "convert" processes 2. iSCSI storage is unreachable for some reason (maybe switch hardware error, power loss, ...) In case 1 both nodes seem to be unable to release locks. When I run this command "ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D" I get a lot of httpd2-prefork processes and some convert processes with status "D" on both nodes. Some of them have "ocfs2_lock" in wchan column. I am unable to kill any of this processes the processes. Kill <processed> and killall -9 httd2-prefork also does not kill the processes. I had to turn off both nodes. Rebooting is not possible in this case. In case 2 both node are available and idle. All interfaces including heartbeat are up and running. Then, when I pull iSCSI network cable from one node, the node is fence itself and reboots after 1 minute. This works as expected. But the other node is not deploying web site until the "failure" node was rebooted. In both cases file system is read-only on at least one node. Rebooting both machines does not bring the volume back to read-write. I have to unmount and mount the volume manually. I have done some tests with only one running node. The behavior is the same as described in case 1 and file system is read-only. Have somebody a hint what I can do to prevent cluster from hang when heartbeat network is up, but one node have lost connection to OCFS2 volume? Is there a way to prevent file system from going into read-only mode? Thanks for any help. Best regards, Christian EPLAN Software & Service GmbH & Co. KG - An der alten Ziegelei 2 - D-40789 Monheim Phone +49 (02173) 3964-190 * Fax +49 (02173) 3964-40190 mailto: Loebbert.C at eplan.de www.eplan.de <http://www.eplan.de/> - Friedhelm-Loh-Group: www.friedhelm-loh-group.com Sitz: Monheim, Amtsgericht D?sseldorf, HRA 16335 - Pers?nlich haftende Gesellschafterin: FL Software GmbH, Sitz: Haiger, Amtsgericht Wetzlar, HRB 3513 - Gesch?ftsf?hrung: Norbert M?ller (Vorsitzender), Hans H?ssig Diese E-Mail ist vertraulich. Eine Weitergabe der darin verk?rperten Informationen ist ohne Zustimmung des Versenders unzul?ssig.*** This email contains confidential information. You are not authorised to copy the contents without the consent of the sender.***