mwoods@fnal.gov
2007-Jul-12  14:25 UTC
[Ocfs2-users] problem on one member caused the other member to panic.
I have 2 system cluster running ocfs2 1.2.5-1. An OS kernel patch was errantly applied to one of the machines without the ocfs module being applied to the new kernel. When the errant machine rebooted, it of course couldn't mount and ocfs2 file systems. Totally expected and easily rectified. What was unexpected was, this caused the following message on the other cluster member and caused it to reboot. I am assuming this happened because I have something misconfigured. What caused a problem on one system to panic the other systems and how can I protect this from happening in the future. Thanks. Jul 12 06:04:49 finps03 kernel: (0,0):o2net_idle_timer:1418 here are some times that might help debug the situation: (tmr 1184238259.664319 now 1184238289.659909 dr 1184238259.6643 03 adv 1184238259.664349:1184238259.664353 func (89487f70:1) 1184238259.664324:1184238259.664339) Jul 12 06:04:49 finps03 kernel: o2net: no longer connected to node xxxx.fnal.gov (num 0) at ttt.sss.xxx.yyy:7777 Jul 12 06:04:49 finps03 kernel: (6006,10):ocfs2_process_vote:494 ERROR: message to node 0 fails with error -112! Jul 12 06:04:49 finps03 kernel: (6007,5):dlm_send_proxy_ast_msg:459 ERROR: status = -112 Jul 12 06:04:49 finps03 kernel: (6007,5):dlm_flush_asts:600 ERROR: status -112
Sunil Mushran
2007-Jul-13  20:08 UTC
[Ocfs2-users] problem on one member caused the other member to panic.
Very hard to reconstruct the events wih just a snippet. File a bugzilla and attach all the relevant messages files. Also mention the time of the crash. On Thu, Jul 12, 2007 at 04:24:07PM -0500, mwoods@fnal.gov wrote:> > I have 2 system cluster running ocfs2 1.2.5-1. An OS kernel patch was > errantly applied to one of the machines without the ocfs module being > applied to the new kernel. When the errant machine rebooted, it of course > couldn't mount and ocfs2 file systems. Totally expected and easily > rectified. What was unexpected was, this caused the following message on > the other cluster member and caused it to reboot. > > I am assuming this happened because I have something misconfigured. > > What caused a problem on one system to panic the other systems and how can > I protect this from happening in the future. > > Thanks. > > > Jul 12 06:04:49 finps03 kernel: (0,0):o2net_idle_timer:1418 here are some > times that might help debug the situation: (tmr 1184238259.664319 now > 1184238289.659909 dr 1184238259.6643 > 03 adv 1184238259.664349:1184238259.664353 func (89487f70:1) > 1184238259.664324:1184238259.664339) > Jul 12 06:04:49 finps03 kernel: o2net: no longer connected to node > xxxx.fnal.gov (num 0) at ttt.sss.xxx.yyy:7777 > Jul 12 06:04:49 finps03 kernel: (6006,10):ocfs2_process_vote:494 ERROR: > message to node 0 fails with error -112! > Jul 12 06:04:49 finps03 kernel: (6007,5):dlm_send_proxy_ast_msg:459 ERROR: > status = -112 > Jul 12 06:04:49 finps03 kernel: (6007,5):dlm_flush_asts:600 ERROR: status > -112 > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users