netbsd at tango.lu
2017-Aug-08 12:09 UTC
[Ocfs2-users] Cluster name is invalid while trying to join the group
Hello list, We are using debian wheezy based 3 node OCFS2 cluster: ii ocfs2-tools 1.6.4-1+deb7u1 amd64 tools for managing OCFS2 cluster filesystems 4.1.1 custom kernel on all 3 nodes. Since we had performance issues with it we have decide to upgrade from wheezy -> stratch. This was not truely an upgrade, rather installing a new brand new stratch node and move over the main configs. ii ocfs2-tools 1.8.4-4 amd64 tools for managing OCFS2 cluster filesystems The cluster.conf on all 3 machines: cluster: node_count = 3 name = web node: number = 0 cluster = web ip_port = 7777 ip_address = 10.0.0.2 name = webserver1 node: number = 1 cluster = web ip_port = 7777 ip_address = 10.0.0.3 name = webserver2 node: number = 2 cluster = web ip_port = 7777 ip_address = 10.0.0.4 name = webserver3 The new node fails to join the cluster with: Aug 8 13:56:28 webserver3 ocfs2[13529]: Stopping Oracle Cluster File System (OCFS2) OK Aug 8 14:00:40 webserver3 ocfs2[13581]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Cluster name is invalid while trying to join the group Aug 8 14:00:40 webserver3 ocfs2[13581]: mount.ocfs2: Cluster name is invalid while trying to join the group Aug 8 14:00:40 webserver3 ocfs2[13581]: Failed Aug 8 14:01:02 webserver3 ocfs2[13646]: Stopping Oracle Cluster File System (OCFS2) OK Aug 8 14:01:02 webserver3 ocfs2[13657]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Cluster name is invalid while trying to join the group Aug 8 14:01:02 webserver3 ocfs2[13657]: mount.ocfs2: Cluster name is invalid while trying to join the group Aug 8 14:01:02 webserver3 ocfs2[13657]: Failed Aug 8 14:01:27 webserver3 ocfs2[452]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: Cluster name is invalid while trying to join the group Aug 8 14:01:27 webserver3 ocfs2[452]: mount.ocfs2: Cluster name is invalid while trying to join the group Aug 8 14:01:27 webserver3 ocfs2[452]: Failed We have tried everything, including shutting down 1 of the 3 nodes to the point of shutting down all nodes except the new one. It still cannot start start. We have also migrated the old 4.1.1 kernel with the exact same modules to the new webserver3, it did not help either. Any ideas what is causing this? PR, Claude