Jonathan Ramsay
2015-Aug-05 16:38 UTC
[Ocfs2-users] Previously working cluster - now one node cannot connect
Hello , Have four node cluster - was working fine - until our DHCP/DNS went down . The cluster will now not show up on one host. This host DID change IP addresses as it was not static . I updated all (4) /etc/ocfs2/cluster.conf . They are all identical. cluster: node_count = 4 name = saturn node: number = 0 cluster = saturn ip_port = 7777 ip_address = 10.0.0.11 name = nile node: number = 1 cluster = saturn ip_port = 7777 ip_address = 10.0.0.32 name = rio node: number = 2 cluster = saturn ip_port = 7777 ip_address = 10.0.0.30 name = mekong node: number = 3 cluster = saturn ip_port = 7777 ip_address = 10.0.0.13 name = volga When either using fstab or direct mount I get : root at mekong:~# service ocfs2 reload Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Invalid argument while mounting /dev/sda on /titan. Check 'dmesg' for more information on this error. mount -t ocfs2 /dev/sda /titan mount.ocfs2: Invalid argument while mounting /dev/sda on /titan. Check 'dmesg' for more information on this error. And in dmesg I get : [Wed Aug 5 12:20:27 2015] o2net: Connected to node nile (num 0) at 10.0.0.11:7777 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_send_nodeinfo:1291 ERROR: node mismatch -22, node 0 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_try_to_join_domain:1675 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_join_domain:1945 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):dlm_register_domain:2204 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):o2cb_cluster_connect:368 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_dlm_init:3004 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_mount_volume:1881 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] ocfs2: Unmounting device (8,0) on (node 0) [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_fill_super:1229 ERROR: status = -22 [Wed Aug 5 12:20:33 2015] o2net: No longer connected to node nile (num 0) at 10.0.0.11:7777 [Wed Aug 5 12:21:13 2015] o2net: Connected to node nile (num 0) at 10.0.0.11:7777 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_send_nodeinfo:1291 ERROR: node mismatch -22, node 0 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_try_to_join_domain:1675 ERROR: status = -22 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_join_domain:1945 ERROR: status = -22 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_register_domain:2204 ERROR: status = -22 I have found found several "node mismatch" and "ERROR: status = -22" issue(s) but none seem applicable . Any suggestions welcome . Thanks , J.R. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20150805/a989e012/attachment.html
A. C. Censi
2015-Aug-05 22:16 UTC
[Ocfs2-users] Previously working cluster - now one node cannot connect
See 2nd message in this thread: http://comments.gmane.org/gmane.comp.file-systems.ocfs2.user/6014 -- A C Censi accensi @ gmail . com accensi @ montreal . com . br Hello , Have four node cluster - was working fine - until our DHCP/DNS went down . The cluster will now not show up on one host. This host DID change IP addresses as it was not static . I updated all (4) /etc/ocfs2/cluster.conf . They are all identical. cluster: node_count = 4 name = saturn node: number = 0 cluster = saturn ip_port = 7777 ip_address = 10.0.0.11 name = nile node: number = 1 cluster = saturn ip_port = 7777 ip_address = 10.0.0.32 name = rio node: number = 2 cluster = saturn ip_port = 7777 ip_address = 10.0.0.30 name = mekong node: number = 3 cluster = saturn ip_port = 7777 ip_address = 10.0.0.13 name = volga When either using fstab or direct mount I get : root at mekong:~# service ocfs2 reload Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Invalid argument while mounting /dev/sda on /titan. Check 'dmesg' for more information on this error. mount -t ocfs2 /dev/sda /titan mount.ocfs2: Invalid argument while mounting /dev/sda on /titan. Check 'dmesg' for more information on this error. And in dmesg I get : [Wed Aug 5 12:20:27 2015] o2net: Connected to node nile (num 0) at 10.0.0.11:7777 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_send_nodeinfo:1291 ERROR: node mismatch -22, node 0 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_try_to_join_domain:1675 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_join_domain:1945 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):dlm_register_domain:2204 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):o2cb_cluster_connect:368 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_dlm_init:3004 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_mount_volume:1881 ERROR: status = -22 [Wed Aug 5 12:20:31 2015] ocfs2: Unmounting device (8,0) on (node 0) [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_fill_super:1229 ERROR: status = -22 [Wed Aug 5 12:20:33 2015] o2net: No longer connected to node nile (num 0) at 10.0.0.11:7777 [Wed Aug 5 12:21:13 2015] o2net: Connected to node nile (num 0) at 10.0.0.11:7777 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_send_nodeinfo:1291 ERROR: node mismatch -22, node 0 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_try_to_join_domain:1675 ERROR: status = -22 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_join_domain:1945 ERROR: status = -22 [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_register_domain:2204 ERROR: status = -22 I have found found several "node mismatch" and "ERROR: status = -22" issue(s) but none seem applicable . Any suggestions welcome . Thanks , J.R. _______________________________________________ Ocfs2-users mailing list Ocfs2-users at oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20150805/d7e91486/attachment.html
Joseph Qi
2015-Aug-06 00:58 UTC
[Ocfs2-users] Previously working cluster - now one node cannot connect
Please check the node config in configfs in each node: /sys/kernel/config/cluster/<your_cluster_name>/node/ If it is not the same, try to offline the cluster and then online, which will reload the config from the cluster.conf. On 2015/8/6 0:38, Jonathan Ramsay wrote:> Hello , > > Have four node cluster - was working fine - until our DHCP/DNS went down . > The cluster will now not show up on one host. This host DID change IP addresses as it was not static . > I updated all (4) /etc/ocfs2/cluster.conf . They are all identical. > > cluster: > node_count = 4 > name = saturn > > node: > number = 0 > cluster = saturn > ip_port = 7777 > ip_address = 10.0.0.11 > name = nile > > node: > number = 1 > cluster = saturn > ip_port = 7777 > ip_address = 10.0.0.32 > name = rio > > node: > number = 2 > cluster = saturn > ip_port = 7777 > ip_address = 10.0.0.30 > name = mekong > > node: > number = 3 > cluster = saturn > ip_port = 7777 > ip_address = 10.0.0.13 > name = volga > > When either using fstab or direct mount I get : > > root at mekong:~# service ocfs2 reload > Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Invalid argument while mounting /dev/sda on /titan. Check 'dmesg' for more information on this error. > > mount -t ocfs2 /dev/sda /titan > mount.ocfs2: Invalid argument while mounting /dev/sda on /titan. Check 'dmesg' for more information on this error. > > And in dmesg I get : > > [Wed Aug 5 12:20:27 2015] o2net: Connected to node nile (num 0) at 10.0.0.11:7777 <http://10.0.0.11:7777> > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_send_nodeinfo:1291 ERROR: node mismatch -22, node 0 > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_try_to_join_domain:1675 ERROR: status = -22 > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,3):dlm_join_domain:1945 ERROR: status = -22 > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):dlm_register_domain:2204 ERROR: status = -22 > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):o2cb_cluster_connect:368 ERROR: status = -22 > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_dlm_init:3004 ERROR: status = -22 > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_mount_volume:1881 ERROR: status = -22 > [Wed Aug 5 12:20:31 2015] ocfs2: Unmounting device (8,0) on (node 0) > [Wed Aug 5 12:20:31 2015] (mount.ocfs2,25447,1):ocfs2_fill_super:1229 ERROR: status = -22 > [Wed Aug 5 12:20:33 2015] o2net: No longer connected to node nile (num 0) at 10.0.0.11:7777 <http://10.0.0.11:7777> > [Wed Aug 5 12:21:13 2015] o2net: Connected to node nile (num 0) at 10.0.0.11:7777 <http://10.0.0.11:7777> > [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_send_nodeinfo:1291 ERROR: node mismatch -22, node 0 > [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_try_to_join_domain:1675 ERROR: status = -22 > [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_join_domain:1945 ERROR: status = -22 > [Wed Aug 5 12:21:17 2015] (mount.ocfs2,25470,2):dlm_register_domain:2204 ERROR: status = -22 > > > I have found found several "node mismatch" and "ERROR: status = -22" issue(s) but none seem applicable . > > Any suggestions welcome . > > Thanks , > > J.R. > > > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users >