Sergey Bolbat
2011-Sep-14 05:08 UTC
[Ocfs-users] Hard restart of the nodes after loosing connection
Hello, I have a strange thing with my OCFS2 system. I have two servers with Debian squeeze on one and Debian sid on another. Both of them have 2.6.32 kernel. Ocfs2-tools 1.6.3-2 installed on both. They'r connected with iSCSI interface and open-iscsi to HP StorageWorks MSA 2012 using one controller of it. My storage system experiences problems with power supply unit and it restarts sometime (it begun recently and I'm working on fixing it). When it hangs and restart, two of my OCFS2 nodes, connected to storage system, restart too. I think it's not right - I can loose my data, because these are production servers. That's what I caught in console when node gone restarting: Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70629.910077] general protection fault: 0000 [#1] SMP Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70629.910118] last sysfs file: /sys/fs/o2cb/interface_revision Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70629.911397] Stack: Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70629.911603] Call Trace: Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70629.911948] Code: fa 66 0f 1f 44 00 00 65 8b 04 25 a8 e3 00 00 48 98 49 8b 94 c4 f8 02 00 00 8b 4a 18 89 4c 24 14 48 8b 1a 48 85 db 74 0c 8b 42 14 <48> 8b 04 c3 48 89 02 eb 1d 48 8b 4c 24 08 49 89 d0 89 ee 83 ca Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70630.289837] general protection fault: 0000 [#2] SMP Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70630.289978] last sysfs file: /sys/fs/o2cb/interface_revision Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70630.295417] Stack: Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70630.296222] Call Trace: Message from syslogd at urta at Sep 13 18:34:09 ... kernel:[70630.296828] Code: fa 66 0f 1f 44 00 00 65 8b 04 25 a8 e3 00 00 48 98 49 8b 94 c4 f8 02 00 00 8b 4a 18 89 4c 24 14 48 8b 1a 48 85 db 74 0c 8b 42 14 <48> 8b 04 c3 48 89 02 eb 1d 48 8b 4c 24 08 49 89 d0 89 ee 83 ca I'd just like to have an advice, what I can do with OCFS configuration to prevent system restarts caused by storage problems. Thanks for any help. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs-users/attachments/20110914/e40d1cba/attachment.html