thr3ads.net - Ocfs2 users - [Ocfs2-users] problems with OCFS2 locking [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Loebbert.C at eplan.de

2010-Feb-15 14:45 UTC

[Ocfs2-users] problems with OCFS2 locking

Hi List,

 

I have a problem with my OCFS2 volume. I have two web servers connected to an
EqualLogic shared storage via Open-iSCSI. This web servers are hosting our web
presence (Typo3 CMS) and some forums (phpBB 3). We encountered two cases when
OCFS2 cluster is hanging:

 

1.       CPU usage on one node increases to 100% because of high MySQL usage or
several "convert" processes

2.       iSCSI storage is unreachable for some reason (maybe switch hardware
error, power loss, ...)

 

In case 1 both nodes seem to be unable to release locks. When I run this command
"ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D" I get a lot
of httpd2-prefork processes and some convert processes with status "D"
on both nodes. Some of them have "ocfs2_lock" in wchan column. I am
unable to kill any of this processes the processes. Kill <processed> and
killall -9 httd2-prefork also does not kill the processes. I had to turn off
both nodes. Rebooting is not possible in this case.

 

In case 2 both node are available and idle. All interfaces including heartbeat
are up and running. Then, when I pull iSCSI network cable from one node, the
node is fence itself and reboots after 1 minute. This works as expected. But the
other node is not deploying web site until the "failure" node was
rebooted.

 

In both cases file system is read-only on at least one node. Rebooting both
machines does not bring the volume back to read-write. I have to unmount and
mount the volume manually.

 

I have done some tests with only one running node. The behavior is the same as
described in case 1 and file system is read-only.

 

Have somebody a hint what I can do to prevent cluster from hang when heartbeat
network is up, but one node have lost connection to OCFS2 volume?

Is there a way to prevent file system from going into read-only mode?

 

Thanks for any help.

 

Best regards,

Christian
EPLAN Software & Service GmbH & Co. KG - An der alten Ziegelei 2 -
D-40789 Monheim
Phone +49 (02173) 3964-190  -  Fax +49 (02173) 3964-40190
mailto: Loebbert.C at eplan.de
http://www.eplan.de - Friedhelm-Loh-Group: www.friedhelm-loh-group.com

Sitz: Monheim, Amtsgericht D?sseldorf, HRA 16335 - Pers?nlich haftende
Gesellschafterin: FL Software GmbH, Sitz: Haiger, Amtsgericht Wetzlar, HRB 3513
- Gesch?ftsf?hrung: Norbert M?ller (Vorsitzender), Hans H?ssig

Diese E-Mail ist vertraulich. Eine Weitergabe der darin verk?rperten
Informationen ist ohne Zustimmung des Versenders unzul?ssig.***
This email contains confidential information. You are not authorised to copy the
contents without the consent of the sender.***

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100215/06821ace/attachment.html

Robert Wallace

2010-Feb-15 16:52 UTC

head link

[Ocfs2-users] problems with OCFS2 locking

Please remove me from this email list.

________________________________

From: ocfs2-users-bounces at oss.oracle.com on behalf of Loebbert.C at eplan.de
Sent: Mon 2/15/2010 9:45 AM
To: ocfs2-users at oss.oracle.com
Subject: [Ocfs2-users] problems with OCFS2 locking

Hi List,

I have a problem with my OCFS2 volume. I have two web servers connected to an
EqualLogic shared storage via Open-iSCSI. This web servers are hosting our web
presence (Typo3 CMS) and some forums (phpBB 3). We encountered two cases when
OCFS2 cluster is hanging:

1.       CPU usage on one node increases to 100% because of high MySQL usage or
several "convert" processes

2.       iSCSI storage is unreachable for some reason (maybe switch hardware
error, power loss, ...)

In case 1 both nodes seem to be unable to release locks. When I run this command
"ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep D" I get a lot
of httpd2-prefork processes and some convert processes with status "D"
on both nodes. Some of them have "ocfs2_lock" in wchan column. I am
unable to kill any of this processes the processes. Kill <processed> and
killall -9 httd2-prefork also does not kill the processes. I had to turn off
both nodes. Rebooting is not possible in this case.

In case 2 both node are available and idle. All interfaces including heartbeat
are up and running. Then, when I pull iSCSI network cable from one node, the
node is fence itself and reboots after 1 minute. This works as expected. But the
other node is not deploying web site until the "failure" node was
rebooted.

In both cases file system is read-only on at least one node. Rebooting both
machines does not bring the volume back to read-write. I have to unmount and
mount the volume manually.

I have done some tests with only one running node. The behavior is the same as
described in case 1 and file system is read-only.

Have somebody a hint what I can do to prevent cluster from hang when heartbeat
network is up, but one node have lost connection to OCFS2 volume?

Is there a way to prevent file system from going into read-only mode?

Thanks for any help.

Best regards,

Christian

EPLAN Software & Service GmbH & Co. KG - An der alten Ziegelei 2 -
D-40789 Monheim
Phone +49 (02173) 3964-190  *  Fax +49 (02173) 3964-40190
mailto: Loebbert.C at eplan.de
www.eplan.de <http://www.eplan.de/>  - Friedhelm-Loh-Group:
www.friedhelm-loh-group.com

Sitz: Monheim, Amtsgericht D?sseldorf, HRA 16335 - Pers?nlich haftende
Gesellschafterin: FL Software GmbH, Sitz: Haiger, Amtsgericht Wetzlar, HRB 3513
- Gesch?ftsf?hrung: Norbert M?ller (Vorsitzender), Hans H?ssig

Diese E-Mail ist vertraulich. Eine Weitergabe der darin verk?rperten
Informationen ist ohne Zustimmung des Versenders unzul?ssig.***
This email contains confidential information. You are not authorised to copy the
contents without the consent of the sender.***

Ocfs2 users - Feb 2010 - problems with OCFS2 locking

[Ocfs2-users] problems with OCFS2 locking

[Ocfs2-users] problems with OCFS2 locking