thr3ads.net - Ocfs users - [Ocfs-users] Hard restart of the nodes after loosing connection [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Sergey Bolbat

2011-Sep-14 05:08 UTC

[Ocfs-users] Hard restart of the nodes after loosing connection

Hello,
I have a strange thing with my OCFS2 system.
I have two servers with Debian squeeze on one and Debian sid on another.
Both of them have 2.6.32 kernel. Ocfs2-tools 1.6.3-2 installed on both.
They'r connected with iSCSI interface and open-iscsi to HP StorageWorks MSA
2012 using one controller of it.
My storage system experiences problems with power supply unit and it
restarts sometime (it begun recently and I'm working on fixing it). When it
hangs and restart, two of my OCFS2 nodes, connected to storage system,
restart too.
I think it's not right - I can loose my data, because these are production
servers.

That's what I caught in console when node gone restarting:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.910077] general protection fault: 0000 [#1] SMP

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.910118] last sysfs file: /sys/fs/o2cb/interface_revision

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.911397] Stack:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.911603] Call Trace:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70629.911948] Code: fa 66 0f 1f 44 00 00 65 8b 04 25 a8 e3 00 00 48
98 49 8b 94 c4 f8 02 00 00 8b 4a 18 89 4c 24 14 48 8b 1a 48 85 db 74 0c 8b
42 14 <48> 8b 04 c3 48 89 02 eb 1d 48 8b 4c 24 08 49 89 d0 89 ee 83 ca

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.289837] general protection fault: 0000 [#2] SMP

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.289978] last sysfs file: /sys/fs/o2cb/interface_revision

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.295417] Stack:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.296222] Call Trace:

Message from syslogd at urta at Sep 13 18:34:09 ...
 kernel:[70630.296828] Code: fa 66 0f 1f 44 00 00 65 8b 04 25 a8 e3 00 00 48
98 49 8b 94 c4 f8 02 00 00 8b 4a 18 89 4c 24 14 48 8b 1a 48 85 db 74 0c 8b
42 14 <48> 8b 04 c3 48 89 02 eb 1d 48 8b 4c 24 08 49 89 d0 89 ee 83 ca

I'd just like to have an advice, what I can do with OCFS configuration to
prevent system restarts caused by storage problems.

Thanks for any help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs-users/attachments/20110914/e40d1cba/attachment.html

Seemingly Similar Threads

Search for more reasonably related threads

Ocfs users - Sep 2011 - Hard restart of the nodes after loosing connection

[Ocfs-users] Hard restart of the nodes after loosing connection

Seemingly Similar Threads

Wisdom of the Ancients