AZabelin@topsbi.ru
2005-Aug-19 16:22 UTC
[Ocfs2-users] Kernel panic: ocfs2 is very sorry to be fencing this system by panicing
Hello, Prompt where to search for the decision of a problem, please. When rebooting host2 - ocfs2 working very good. But by rebooting host1, on host2 in /var/log/messages "o2net_check_quorum:1468 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0" and "Kernel panic: ocfs2 is very sorry to be fencing this system by panicing". Part of /var/log/messages with both nodes in attachment. -------------- next part -------------- Aug 15 10:39:00 host2 exiting on signal 15 Aug 15 10:39:10 host1 kernel: (0,0):o2net_state_change:512 connection to node host2 num 1 at 10.0.1.2:7777 has been idle for 10 seconds, shutting it down. Aug 15 10:39:10 host1 kernel: (6584,0):o2net_set_nn_state:414 no longer connected to node host2 at 10.0.1.2:7777 Aug 15 10:39:46 host1 kernel: (7913,1):dlm_send_proxy_ast_msg:428 ERROR: status = -107 Aug 15 10:39:46 host1 kernel: (7913,1):dlm_flush_asts:572 ERROR: status = -107 Aug 15 10:52:50 host1 kernel: (7913,3):dlm_send_proxy_ast_msg:428 ERROR: status = -107 Aug 15 10:52:50 host1 kernel: (7913,3):dlm_flush_asts:572 ERROR: status = -107 Aug 15 10:52:50 host1 kernel: (8021,1):dlm_send_proxy_ast_msg:428 ERROR: status = -107 Aug 15 10:52:50 host1 kernel: (8021,1):dlm_flush_asts:572 ERROR: status = -107 Aug 15 10:52:50 host1 kernel: (1340,2):ocfs2_replay_journal:1123 Recovering node 1 from slot 1 on device (8,16) Aug 15 10:52:50 host1 kernel: (1341,0):ocfs2_replay_journal:1123 Recovering node 1 from slot 1 on device (8,48) Aug 15 10:52:51 host1 kernel: kjournald starting. Commit interval 5 seconds Aug 15 10:52:51 host1 kernel: kjournald starting. Commit interval 5 seconds Aug 15 10:55:50 host2 syslogd 1.4.1: restart. Aug 15 10:55:55 host2 kernel: OCFS2 Node Manager 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 10:55:55 host2 kernel: OCFS2 DLM 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 10:55:55 host2 kernel: OCFS2 DLMFS 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 10:55:55 host2 kernel: OCFS2 User DLM kernel interface loaded Aug 15 10:56:00 host1 kernel: (6584,0):o2net_set_nn_state:431 accepted connection from node host2 num 1 at 10.0.1.2:7777 Aug 15 10:56:02 host2 kernel: (6638,0):o2net_set_nn_state:431 connected to node host1 num 0 at 10.0.1.1:7777 Aug 15 10:56:04 host1 kernel: (6584,0):__dlm_print_nodes:379 Nodes in my domain ("F710595BBF68481A9D08F42DF6FCA92D"): Aug 15 10:56:04 host1 kernel: (6584,0):__dlm_print_nodes:383 node 0 Aug 15 10:56:04 host1 kernel: (6584,0):__dlm_print_nodes:383 node 1 Aug 15 10:56:06 host2 kernel: OCFS2 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 10:56:06 host2 kernel: (7924,3):ocfs2_initialize_osb:1179 max_slots for this device: 2 Aug 15 10:56:07 host2 kernel: (7924,3):ocfs2_fill_local_node_info:851 I am node 1 Aug 15 10:56:07 host2 kernel: (7924,1):__dlm_print_nodes:379 Nodes in my domain ("F710595BBF68481A9D08F42DF6FCA92D"): Aug 15 10:56:07 host2 kernel: (7924,1):__dlm_print_nodes:383 node 0 Aug 15 10:56:07 host2 kernel: (7924,1):__dlm_print_nodes:383 node 1 Aug 15 10:56:07 host2 kernel: (7924,1):ocfs2_find_slot:266 taking node slot 1 Aug 15 10:56:07 host2 kernel: kjournald starting. Commit interval 5 seconds Aug 15 10:56:07 host2 kernel: ocfs2: Mounting device (8,16) on (node 1, slot 1) Aug 15 10:56:08 host1 kernel: (6584,0):__dlm_print_nodes:379 Nodes in my domain ("D414ADF107F84066A6B32BBD6F25C55E"): Aug 15 10:56:08 host1 kernel: (6584,0):__dlm_print_nodes:383 node 0 Aug 15 10:56:08 host1 kernel: (6584,0):__dlm_print_nodes:383 node 1 Aug 15 10:56:11 host2 kernel: (8067,3):ocfs2_initialize_osb:1179 max_slots for this device: 4 Aug 15 10:56:11 host2 kernel: (8067,3):ocfs2_fill_local_node_info:851 I am node 1 Aug 15 10:56:11 host2 kernel: (8067,1):__dlm_print_nodes:379 Nodes in my domain ("D414ADF107F84066A6B32BBD6F25C55E"): Aug 15 10:56:11 host2 kernel: (8067,1):__dlm_print_nodes:383 node 0 Aug 15 10:56:11 host2 kernel: (8067,1):__dlm_print_nodes:383 node 1 Aug 15 10:56:11 host2 kernel: (8067,1):ocfs2_find_slot:266 taking node slot 1 Aug 15 10:56:11 host2 kernel: kjournald starting. Commit interval 5 seconds Aug 15 10:56:11 host2 kernel: ocfs2: Mounting device (8,48) on (node 1, slot 1) Aug 15 10:56:11 host2 logger: (Oracle CSSD will be run out of init) Aug 15 10:56:11 host2 logger: (Oracle EVMD will be run out of init) Aug 15 10:56:11 host2 logger: (Oracle CRSD will be run out of init, set to start boot services) Aug 15 11:06:17 host1 exiting on signal 15 Aug 15 11:06:29 host2 kernel: (0,0):o2net_state_change:512 connection to node host1 num 0 at 10.0.1.1:7777 has been idle for 10 seconds, shutting it down. Aug 15 11:06:29 host2 kernel: (6638,0):o2net_set_nn_state:414 no longer connected to node host1 at 10.0.1.1:7777 Aug 15 11:06:47 host2 kernel: (6638,0):o2net_check_quorum:1468 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 Aug 15 11:06:47 host2 kernel: (6638,0):o2hb_stop_all_regions:1589 ERROR: stopping heartbeat on all active regions. Aug 15 11:06:47 host2 kernel: Kernel panic: ocfs2 is very sorry to be fencing this system by panicing Aug 15 11:06:47 host2 kernel: Aug 15 11:10:38 host1 syslogd 1.4.1: restart. Aug 15 11:10:43 host1 kernel: OCFS2 Node Manager 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:10:43 host1 kernel: OCFS2 DLM 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:10:43 host1 kernel: OCFS2 DLMFS 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:10:43 host1 kernel: OCFS2 User DLM kernel interface loaded Aug 15 11:10:53 host1 kernel: OCFS2 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:10:53 host1 kernel: (7664,3):ocfs2_initialize_osb:1179 max_slots for this device: 2 Aug 15 11:10:53 host1 kernel: (7664,3):ocfs2_fill_local_node_info:851 I am node 0 Aug 15 11:10:53 host1 kernel: (7664,1):__dlm_print_nodes:379 Nodes in my domain ("F710595BBF68481A9D08F42DF6FCA92D"): Aug 15 11:10:53 host1 kernel: (7664,1):__dlm_print_nodes:383 node 0 Aug 15 11:10:53 host1 kernel: (7664,1):ocfs2_find_slot:266 taking node slot 0 Aug 15 11:10:53 host1 kernel: kjournald starting. Commit interval 5 seconds Aug 15 11:10:53 host1 kernel: ocfs2: Mounting device (8,16) on (node 0, slot 0) Aug 15 11:10:57 host1 kernel: (7925,3):ocfs2_initialize_osb:1179 max_slots for this device: 4 Aug 15 11:10:57 host1 kernel: (7925,3):ocfs2_fill_local_node_info:851 I am node 0 Aug 15 11:10:57 host1 kernel: (7925,1):__dlm_print_nodes:379 Nodes in my domain ("D414ADF107F84066A6B32BBD6F25C55E"): Aug 15 11:10:57 host1 kernel: (7925,1):__dlm_print_nodes:383 node 0 Aug 15 11:10:57 host1 kernel: (7925,3):ocfs2_find_slot:266 taking node slot 0 Aug 15 11:10:57 host1 kernel: kjournald starting. Commit interval 5 seconds Aug 15 11:10:57 host1 kernel: ocfs2: Mounting device (8,48) on (node 0, slot 0) Aug 15 11:10:57 host1 logger: (Oracle CSSD will be run out of init) Aug 15 11:10:57 host1 logger: (Oracle EVMD will be run out of init) Aug 15 11:10:57 host1 logger: (Oracle CRSD will be run out of init, set to start boot services) Aug 15 11:14:03 host2 syslogd 1.4.1: restart. Aug 15 11:14:09 host2 kernel: OCFS2 Node Manager 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:14:09 host2 kernel: OCFS2 DLM 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:14:09 host2 kernel: OCFS2 DLMFS 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:14:09 host2 kernel: OCFS2 User DLM kernel interface loaded Aug 15 11:14:14 host1 kernel: (6644,0):o2net_set_nn_state:431 accepted connection from node host2 num 1 at 10.0.1.2:7777 Aug 15 11:14:16 host2 kernel: (6643,0):o2net_set_nn_state:431 connected to node host1 num 0 at 10.0.1.1:7777 Aug 15 11:14:18 host1 kernel: (6644,0):__dlm_print_nodes:379 Nodes in my domain ("F710595BBF68481A9D08F42DF6FCA92D"): Aug 15 11:14:18 host1 kernel: (6644,0):__dlm_print_nodes:383 node 0 Aug 15 11:14:18 host1 kernel: (6644,0):__dlm_print_nodes:383 node 1 Aug 15 11:14:20 host2 kernel: OCFS2 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles) Aug 15 11:14:20 host2 kernel: (7920,2):ocfs2_initialize_osb:1179 max_slots for this device: 2 Aug 15 11:14:20 host2 kernel: (7920,2):ocfs2_fill_local_node_info:851 I am node 1 Aug 15 11:14:20 host2 kernel: (7920,2):__dlm_print_nodes:379 Nodes in my domain ("F710595BBF68481A9D08F42DF6FCA92D"): Aug 15 11:14:20 host2 kernel: (7920,2):__dlm_print_nodes:383 node 0 Aug 15 11:14:20 host2 kernel: (7920,2):__dlm_print_nodes:383 node 1 Aug 15 11:14:20 host2 kernel: (7920,2):ocfs2_find_slot:266 taking node slot 1 Aug 15 11:14:20 host2 kernel: kjournald starting. Commit interval 5 seconds Aug 15 11:14:20 host2 kernel: ocfs2: Mounting device (8,16) on (node 1, slot 1) Aug 15 11:14:22 host1 kernel: (6644,0):__dlm_print_nodes:379 Nodes in my domain ("D414ADF107F84066A6B32BBD6F25C55E"): Aug 15 11:14:22 host1 kernel: (6644,0):__dlm_print_nodes:383 node 0 Aug 15 11:14:22 host1 kernel: (6644,0):__dlm_print_nodes:383 node 1 Aug 15 11:14:24 host2 kernel: (8069,3):ocfs2_initialize_osb:1179 max_slots for this device: 4 Aug 15 11:14:24 host2 kernel: (8069,3):ocfs2_fill_local_node_info:851 I am node 1 Aug 15 11:14:24 host2 kernel: (8069,1):__dlm_print_nodes:379 Nodes in my domain ("D414ADF107F84066A6B32BBD6F25C55E"): Aug 15 11:14:24 host2 kernel: (8069,1):__dlm_print_nodes:383 node 0 Aug 15 11:14:24 host2 kernel: (8069,1):__dlm_print_nodes:383 node 1 Aug 15 11:14:24 host2 kernel: (8069,1):ocfs2_find_slot:266 taking node slot 1 Aug 15 11:14:24 host2 kernel: kjournald starting. Commit interval 5 seconds Aug 15 11:14:24 host2 kernel: ocfs2: Mounting device (8,48) on (node 1, slot 1) Aug 15 11:14:24 host2 logger: (Oracle CSSD will be run out of init) Aug 15 11:14:24 host2 logger: (Oracle EVMD will be run out of init) Aug 15 11:14:24 host2 logger: (Oracle CRSD will be run out of init, set to start boot services)
Zach Brown
2005-Aug-19 17:07 UTC
[Ocfs2-users] Kernel panic: ocfs2 is very sorry to be fencing this system by panicing
> When rebooting host2 - ocfs2 working very good. But by rebooting > host1, on host2 in /var/log/messages "o2net_check_quorum:1468 ERROR: > fencing this node> Aug 15 10:55:55 host2 kernel: OCFS2 Node Manager 0.99.15-SLES Mon Jun 27 17:33:57 PDT 2005 (build sles)Sadly, this version of OCFS2 had significant problems with when it decided to fence nodes. A newer version will almost certainly cure this problem for you. You could either wait for an updated SLES kernel that would include a more recent OCFS2 or you could build a current one yourself. http://oss.oracle.com/projects/ocfs2/ - z