msl@calivia.com
2005-Jul-12 03:34 UTC
[Ocfs2-users] node gets fenced after mount of shared volume
My cluster fences the second node as if it had lost IP connectivity between the nodes 10 seconds after mounting a shared volume. Here's what I do: 1) /etc/init.d/o2cb start on both nodes; modules load fine, Checking Heartbeat: Not Active (both nodes) 2) mount /u00 on node1; Checking heartbeat: Active (node1) 3) mount /u00 on node2; Checking heartbeat: Active (node2) After 5 seconds on node1: kernel: (20248,1):o2net_set_nn_state:437 accepted connection from node node2 num 1 at 10.1.7.53:7777 Jul 12 09:59:16 node1 kernel: (20248,1):__dlm_print_nodes:380 Nodes in my domain ("C69655D0DAE44FE2845FBA0E615269DD"): Jul 12 09:59:16 node1 kernel: (20248,1):__dlm_print_nodes:384 node 0 Jul 12 09:59:16 node1 kernel: (20248,1):__dlm_print_nodes:384 node 1 Jul 12 09:59:33 node1 kernel: (0,1):o2net_idle_timer:1319 connection to node node2 num 1 at 10.1.7.53:7777 has been idle for 10 seconds, shutting it down. Jul 12 09:59:33 node1 kernel: (20248,1):o2net_set_nn_state:420 no longer connected to node node2 at 10.1.7.53:7777 Jul 12 09:59:54 node1 kernel: (20486,1):ocfs2_replay_journal:1123 Recovering node 1 from slot 1 on device (253,5) 10 seconds later on node2: node2 kernel: Kernel panic: ocfs2 is very sorry to be fencing this system by panicing running tcpdump -i eth1 port 7777 shows traffic as soon as I mount a shared LV on the second node. We're running 0.99.16-BETA20 on SLES9 (final SLES9.SP2 download still in progress...). With 0.99.15-SLES from SLES9.SP2-RC4 communication between the nodes seemed to work but a bug [1] prevented further tests. Is this a known bug in 0.99.16-BETA20? thanks, Mike for reference, this is my /etc/ocfs2/cluster.conf node: ip_port = 7777 ip_address = 10.1.7.54 number = 0 name = node1 cluster = OCFS2CLUSTER node: ip_port = 7777 ip_address = 10.1.7.53 number = 1 name = node2 cluster = OCFS2CLUSTER cluster: node_count = 2 name = OCFS2CLUSTER [1] http://oss.oracle.com/bugzilla/show_bug.cgi?id=511
Zach Brown
2005-Jul-12 12:46 UTC
[Ocfs2-users] node gets fenced after mount of shared volume
> running tcpdump -i eth1 port 7777 shows traffic as soon as I mount a > shared LV on the second node.Can you send me that tcpdump off list? It's probably easiest to send a compressed binary dump of the packet stream (-s 1500 -w somefile; gzip somefile).> Is this a known bug in 0.99.16-BETA20?No, I haven't heard reports of this problem before. - z