Soumya Koduri
2015-Sep-22 07:04 UTC
[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume
Hi Tiemen, Have added the steps to configure HA NFS in the below doc. Please verify if you have all the pre-requisites done & steps performed right. https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md Thanks, Soumya On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:> Whoops, replied off-list. > > Additionally I noticed that the generated corosync config is not valid, > as there is no interface section: > > /etc/corosync/corosync.conf > > totem { > version: 2 > secauth: off > cluster_name: rd-ganesha-ha > transport: udpu > } > > nodelist { > ? node { > ? ? ? ? ring0_addr: cobalt > ? ? ? ? nodeid: 1 > ? ? ? ? } > ? node { > ? ? ? ? ring0_addr: iron > ? ? ? ? nodeid: 2 > ? ? ? ? } > } > > quorum { > provider: corosync_votequorum > two_node: 1 > } > > logging { > to_syslog: yes > } > > > > > ---------- Forwarded message ---------- > From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten at rdmedia.com>> > Date: 21 September 2015 at 17:16 > Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume > To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan at redhat.com>> > > > Could you point me to the latest documentation? I've been struggling to > find something up-to-date. I believe I have all the prerequisites: > > - shared storage volume exists and is mounted > - all nodes in hosts files > - Gluster-NFS disabled > - corosync, pacemaker and nfs-ganesha rpm's installed > > Anything I missed? > > Everything has been installed by RPM so is in the default locations: > /usr/libexec/ganesha/ganesha-ha.sh > /etc/ganesha/ganesha.conf (empty) > /etc/ganesha/ganesha-ha.conf > > After I started the pcsd service manually, nfs-ganesha could be enabled > successfully, but there was no virtual IP present on the interfaces and > looking at the system log, I noticed corosync failed to start: > > - on the host where I issued the gluster nfs-ganesha enable command: > > Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server... > Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server. > Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from > iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not monitoring > any hosts > Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine... > Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync Cluster Engine > ('2.3.4'): started and ready to provide service. > Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync built-in > features: dbus systemd xmlconf snmp pie relro bindnow > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport > (UDP/IP Unicast). > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing > transmit/receive security (NSS) crypto: none hash: none > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface > [10.100.30.38] is now up. > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync configuration map access [0] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cmap > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync configuration service [1] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cfg > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync cluster closed process group service v1.01 [2] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cpg > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync profile loading service [4] > Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider > corosync_votequorum > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync vote quorum service v1.0 [5] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: votequorum > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync cluster quorum service v0.1 [3] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: quorum > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member > {10.100.30.38} > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member > {10.100.30.37} > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership > (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members joined: 1 > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1 > Sep 21 17:07:20 iron corosync[3427]: [MAIN ? ] Completed service > synchronization, ready to provide service. > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership > (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined: 1 > Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine > (corosync): [FAILED] > Sep 21 17:08:21 iron systemd: corosync.service: control process exited, > code=exited status=1 > Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine. > Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state. > > > - on the other host: > > Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration... > Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper. > Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper. > Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups. > Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name > Lookups. > Sep 21 17:07:19 cobalt systemd: Starting RPC bind service... > Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration. > Sep 21 17:07:19 cobalt systemd: Started RPC bind service. > Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3 > locking.... > Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting > Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC > Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3 > locking.. > Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server... > Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server. > Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit > capabilities (legacy support in use) > Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha > Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request > from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not > monitoring any hosts > Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the > following cobalt iron > Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability > Cluster Manager. > Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine. > Sep 21 17:07:20 cobalt systemd: Reloading. > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: Reloading. > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine... > Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync Cluster Engine > ('2.3.4'): started and ready to provide service. > Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync built-in > features: dbus systemd xmlconf snmp pie relro bindnow > Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport > (UDP/IP Unicast). > Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing > transmit/receive security (NSS) crypto: none hash: none > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface > [10.100.30.37] is now up. > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync configuration map access [0] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cmap > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync configuration service [1] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cfg > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync cluster closed process group service v1.01 [2] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cpg > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync profile loading service [4] > Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider > corosync_votequorum > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync vote quorum service v1.0 [5] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: votequorum > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync cluster quorum service v0.1 [3] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: quorum > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member > {10.100.30.37} > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member > {10.100.30.38} > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership > (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members joined: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service > synchronization, ready to provide service. > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership > (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service > synchronization, ready to provide service. > Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out. > Terminating. > Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine > (corosync): > Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine. > Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed state. > Sep 21 17:08:55 cobalt logger: warning: pcs property set > no-quorum-policy=ignore failed > Sep 21 17:08:55 cobalt logger: warning: pcs property set > stonith-enabled=false failed > Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start > ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed > Sep 21 17:08:56 cobalt logger: warning: pcs resource delete > nfs_start-clone failed > Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon > ganesha_mon --clone failed > Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace > ganesha_grace --clone failed > Sep 21 17:08:57 cobalt logger: warning pcs resource create > cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor > interval=15s failed > Sep 21 17:08:57 cobalt logger: warning: pcs resource create > cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add > cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint order > cobalt-trigger_ip-1 then nfs-grace-clone failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint order > nfs-grace-clone then cobalt-cluster_ip-1 failed > Sep 21 17:08:57 cobalt logger: warning pcs resource create > iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor > interval=15s failed > Sep 21 17:08:57 cobalt logger: warning: pcs resource create > iron-trigger_ip-1 ocf:heartbeat:Dummy failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add > iron-cluster_ip-1 with iron-trigger_ip-1 failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint order > iron-trigger_ip-1 then nfs-grace-clone failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint order > nfs-grace-clone then iron-cluster_ip-1 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > cobalt-cluster_ip-1 prefers iron=1000 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > cobalt-cluster_ip-1 prefers cobalt=2000 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > iron-cluster_ip-1 prefers cobalt=1000 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > iron-cluster_ip-1 prefers iron=2000 failed > Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push > /tmp/tmp.nXTfyA1GMR failed > Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt failed > > BTW, I'm using CentOS 7. There are multiple network interfaces on the > servers, could that be a problem?? > > > > > On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at redhat.com > <mailto:jthottan at redhat.com>> wrote: > > > > On 21/09/15 13:56, Tiemen Ruiten wrote: >> Hello Soumya, Kaleb, list, >> >> This Friday I created the gluster_shared_storage volume manually, >> I just tried it with the command you supplied, but both have the >> same result: >> >> from etc-glusterfs-glusterd.vol.log on the node where I issued the >> command: >> >> [2015-09-21 07:59:47.756845] I [MSGID: 106474] >> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >> host found Hostname is cobalt >> [2015-09-21 07:59:48.071755] I [MSGID: 106474] >> [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha >> host found Hostname is cobalt >> [2015-09-21 07:59:48.653879] E [MSGID: 106470] >> [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management: >> Initial NFS-Ganesha set up failed > > As far as what I understand from the logs, it called > setup_cluser()[calls `ganesha-ha.sh` script ] but script failed. > Can u please provide following details : > -Location of ganesha.sh file?? > -Location of ganesha-ha.conf, ganesha.conf files ? > > > And also can u cross check whether all the prerequisites before HA > setup satisfied ? > > -- > With Regards, > Jiffin > > >> [2015-09-21 07:59:48.653912] E [MSGID: 106123] >> [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit >> of operation 'Volume (null)' failed on localhost : Failed to set >> up HA config for NFS-Ganesha. Please check the log file for details >> [2015-09-21 07:59:45.402458] I [MSGID: 106006] >> [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] >> 0-management: nfs has disconnected from glusterd. >> [2015-09-21 07:59:48.071578] I [MSGID: 106474] >> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >> host found Hostname is cobalt >> >> from etc-glusterfs-glusterd.vol.log on the other node: >> >> [2015-09-21 08:12:50.111877] E [MSGID: 106062] >> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >> to acquire volname >> [2015-09-21 08:14:50.548087] E [MSGID: 106062] >> [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable >> to acquire volname >> [2015-09-21 08:14:50.654746] I [MSGID: 106132] >> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs >> already stopped >> [2015-09-21 08:14:50.655095] I [MSGID: 106474] >> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >> host found Hostname is cobalt >> [2015-09-21 08:14:51.287156] E [MSGID: 106062] >> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >> to acquire volname >> >> >> from etc-glusterfs-glusterd.vol.log on the arbiter node: >> >> [2015-09-21 08:18:50.934713] E [MSGID: 101075] >> [common-utils.c:3127:gf_is_local_addr] 0-management: error in >> getaddrinfo: Name or service not known >> [2015-09-21 08:18:51.504694] E [MSGID: 106062] >> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >> to acquire volname >> >> I have put the hostnames of all servers in my /etc/hosts file, >> including the arbiter node. >> >> >> On 18 September 2015 at 16:52, Soumya Koduri <skoduri at redhat.com >> <mailto:skoduri at redhat.com>> wrote: >> >> Hi Tiemen, >> >> One of the pre-requisites before setting up nfs-ganesha HA is >> to create and mount shared_storage volume. Use below CLI for that >> >> "gluster volume set all cluster.enable-shared-storage enable" >> >> It shall create the volume and mount in all the nodes >> (including the arbiter node). Note this volume shall be >> mounted on all the nodes of the gluster storage pool (though >> in this case it may not be part of nfs-ganesha cluster). >> >> So instead of manually creating those directory paths, please >> use above CLI and try re-configuring the setup. >> >> Thanks, >> Soumya >> >> On 09/18/2015 07:29 PM, Tiemen Ruiten wrote: >> >> Hello Kaleb, >> >> I don't: >> >> # Name of the HA cluster created. >> # must be unique within the subnet >> HA_NAME="rd-ganesha-ha" >> # >> # The gluster server from which to mount the shared data >> volume. >> HA_VOL_SERVER="iron" >> # >> # N.B. you may use short names or long names; you may not >> use IP addrs. >> # Once you select one, stay with it as it will be mildly >> unpleasant to >> # clean up if you switch later on. Ensure that all names - >> short and/or >> # long - are in DNS or /etc/hosts on all machines in the >> cluster. >> # >> # The subset of nodes of the Gluster Trusted Pool that >> form the ganesha >> # HA cluster. Hostname is specified. >> HA_CLUSTER_NODES="cobalt,iron" >> #HA_CLUSTER_NODES="server1.lab.redhat.com >> <http://server1.lab.redhat.com> >> <http://server1.lab.redhat.com>,server2.lab.redhat.com >> <http://server2.lab.redhat.com> >> <http://server2.lab.redhat.com>,..." >> # >> # Virtual IPs for each of the nodes specified above. >> VIP_server1="10.100.30.101" >> VIP_server2="10.100.30.102" >> #VIP_server1_lab_redhat_com="10.0.2.1" >> #VIP_server2_lab_redhat_com="10.0.2.2" >> >> hosts cobalt & iron are the data nodes, the arbiter >> ip/hostname (neon) >> isn't mentioned anywhere in this config file. >> >> >> On 18 September 2015 at 15:56, Kaleb S. KEITHLEY >> <<mailto:kkeithle at redhat.com>kkeithle at redhat.com >> <mailto:kkeithle at redhat.com> >> <mailto:kkeithle at redhat.com <mailto:kkeithle at redhat.com>>> >> wrote: >> >> ? ? On 09/18/2015 09:46 AM, Tiemen Ruiten wrote: >> ? ? > Hello, >> ? ? > >> ? ? > I have a Gluster cluster with a single replica 3, >> arbiter 1 volume (so >> ? ? > two nodes with actual data, one arbiter node). I >> would like to setup >> ? ? > NFS-Ganesha HA for this volume but I'm having some >> difficulties. >> ? ? > >> ? ? > - I needed to create a directory >> /var/run/gluster/shared_storage >> ? ? > manually on all nodes, or the command 'gluster >> nfs-ganesha enable would >> ? ? > fail with the following error: >> ? ? > [2015-09-18 13:13:34.690416] E [MSGID: 106032] >> ? ? > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name: >> mkdir() failed on path >> ? ? > /var/run/gluster/shared_storage/nfs-ganesha, [No >> such file or directory] >> ? ? > >> ? ? > - Then I found out that the command connects to >> the arbiter node as >> ? ? > well, but obviously I don't want to set up >> NFS-Ganesha there. Is it >> ? ? > actually possible to setup NFS-Ganesha HA with an >> arbiter node? If it's >> ? ? > possible, is there any documentation on how to do >> that? >> ? ? > >> >> ? ? Please send the /etc/ganesha/ganesha-ha.conf file >> you're using. >> >> ? ? Probably you have included the arbiter in your HA >> config; that would be >> ? ? a mistake. >> >> ? ? -- >> >> ? ? Kaleb >> >> >> >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media > > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Tiemen Ruiten
2015-Sep-22 09:05 UTC
[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume
I missed having passwordless SSH auth for the root user. However it did not make a difference: After verifying prerequisites, issued gluster nfs-ganesha enable on node cobalt: Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration... Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper. Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper. Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups. Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name Lookups. Sep 22 10:19:56 cobalt systemd: Starting RPC bind service... Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration. Sep 22 10:19:56 cobalt systemd: Started RPC bind service. Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3 locking.... Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3 locking.. Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server... Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server. Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit capabilities (legacy support in use) Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request from cobalt.int.rdmedia.com while not monitoring any hosts Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the following cobalt iron Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability Cluster Manager. Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine. Sep 22 10:19:57 cobalt systemd: Reloading. Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 22 10:19:57 cobalt systemd: Reloading. Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 22 10:19:57 cobalt systemd: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine... Sep 22 10:19:57 cobalt corosync[2815]: [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service. Sep 22 10:19:57 cobalt corosync[2815]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport (UDP/IP Unicast). Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface [10.100.30.37] is now up. Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync configuration map access [0] Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cmap Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync configuration service [1] Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cfg Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: cpg Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync profile loading service [4] Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider corosync_votequorum Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: votequorum Sep 22 10:19:58 cobalt corosync[2816]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Sep 22 10:19:58 cobalt corosync[2816]: [QB ] server name: quorum Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member {10.100.30.37} Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member {10.100.30.38} Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership ( 10.100.30.37:140) was formed. Members joined: 1 Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership ( 10.100.30.37:148) was formed. Members joined: 1 Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]: Sep 22 10:19:58 cobalt corosync[2816]: [MAIN ] Completed service synchronization, ready to provide service. *Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out. Terminating.* *Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine (corosync):* *Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.* *Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed state.* Sep 22 10:21:32 cobalt logger: warning: pcs property set no-quorum-policy=ignore failed Sep 22 10:21:32 cobalt logger: warning: pcs property set stonith-enabled=false failed Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed Sep 22 10:21:33 cobalt logger: warning: pcs resource delete nfs_start-clone failed Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon ganesha_mon --clone failed Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace ganesha_grace --clone failed Sep 22 10:21:34 cobalt logger: warning pcs resource create cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101 cidr_netmask=32 op monitor interval=15s failed Sep 22 10:21:34 cobalt logger: warning: pcs resource create cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed Sep 22 10:21:34 cobalt logger: warning: pcs constraint order cobalt-trigger_ip-1 then nfs-grace-clone failed Sep 22 10:21:34 cobalt logger: warning: pcs constraint order nfs-grace-clone then cobalt-cluster_ip-1 failed Sep 22 10:21:34 cobalt logger: warning pcs resource create iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32 op monitor interval=15s failed Sep 22 10:21:34 cobalt logger: warning: pcs resource create iron-trigger_ip-1 ocf:heartbeat:Dummy failed Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add iron-cluster_ip-1 with iron-trigger_ip-1 failed Sep 22 10:21:34 cobalt logger: warning: pcs constraint order iron-trigger_ip-1 then nfs-grace-clone failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint order nfs-grace-clone then iron-cluster_ip-1 failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 prefers iron=1000 failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 prefers cobalt=2000 failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 prefers cobalt=1000 failed Sep 22 10:21:35 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 prefers iron=2000 failed Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push /tmp/tmp.yqLT4m75WG failed Notice the failed corosync service in bold. I can't find any logs pointing to a reason. Starting it manually is not a problem: Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine (corosync): [ OK ] Then I noticed pacemaker was not running on both nodes. Started it manually and saw the following in /var/log/messages on the other node: Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin --replace -o configuration -V --xml-pipe Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled since no STONITH resources have been defined Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or disable STONITH with the stonith-enabled option Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared data need STONITH to ensure data integrity Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations until there are resources to manage Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean! Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes until quorum is attained (or no-quorum-policy is set to ignore) Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-20.bz2 Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete Sep 22 10:36:44 iron crmd[4617]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] I'm starting to think there is some leftover config somewhere from all these attempts. Is there a way to completely reset all config related to NFS-Ganesha and start over? On 22 September 2015 at 09:04, Soumya Koduri <skoduri at redhat.com> wrote:> Hi Tiemen, > > Have added the steps to configure HA NFS in the below doc. Please verify > if you have all the pre-requisites done & steps performed right. > > > https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md > > Thanks, > Soumya > > On 09/21/2015 09:21 PM, Tiemen Ruiten wrote: > >> Whoops, replied off-list. >> >> Additionally I noticed that the generated corosync config is not valid, >> as there is no interface section: >> >> /etc/corosync/corosync.conf >> >> totem { >> version: 2 >> secauth: off >> cluster_name: rd-ganesha-ha >> transport: udpu >> } >> >> nodelist { >> ? node { >> ? ? ? ? ring0_addr: cobalt >> ? ? ? ? nodeid: 1 >> ? ? ? ? } >> ? node { >> ? ? ? ? ring0_addr: iron >> ? ? ? ? nodeid: 2 >> ? ? ? ? } >> } >> >> quorum { >> provider: corosync_votequorum >> two_node: 1 >> } >> >> logging { >> to_syslog: yes >> } >> >> >> >> >> ---------- Forwarded message ---------- >> From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten at rdmedia.com >> >> >> Date: 21 September 2015 at 17:16 >> Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume >> To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan at redhat.com >> >> >> >> >> Could you point me to the latest documentation? I've been struggling to >> find something up-to-date. I believe I have all the prerequisites: >> >> - shared storage volume exists and is mounted >> - all nodes in hosts files >> - Gluster-NFS disabled >> - corosync, pacemaker and nfs-ganesha rpm's installed >> >> Anything I missed? >> >> Everything has been installed by RPM so is in the default locations: >> /usr/libexec/ganesha/ganesha-ha.sh >> /etc/ganesha/ganesha.conf (empty) >> /etc/ganesha/ganesha-ha.conf >> >> After I started the pcsd service manually, nfs-ganesha could be enabled >> successfully, but there was no virtual IP present on the interfaces and >> looking at the system log, I noticed corosync failed to start: >> >> - on the host where I issued the gluster nfs-ganesha enable command: >> >> Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server... >> Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server. >> Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from >> iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not monitoring >> any hosts >> Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine... >> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync Cluster Engine >> ('2.3.4'): started and ready to provide service. >> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync built-in >> features: dbus systemd xmlconf snmp pie relro bindnow >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport >> (UDP/IP Unicast). >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing >> transmit/receive security (NSS) crypto: none hash: none >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface >> [10.100.30.38] is now up. >> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: >> corosync configuration map access [0] >> Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cmap >> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: >> corosync configuration service [1] >> Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cfg >> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: >> corosync cluster closed process group service v1.01 [2] >> Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cpg >> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: >> corosync profile loading service [4] >> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider >> corosync_votequorum >> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: >> corosync vote quorum service v1.0 [5] >> Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: votequorum >> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: >> corosync cluster quorum service v0.1 [3] >> Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: quorum >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member >> {10.100.30.38} >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member >> {10.100.30.37} >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership >> (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members joined: >> 1 >> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1 >> Sep 21 17:07:20 iron corosync[3427]: [MAIN ? ] Completed service >> synchronization, ready to provide service. >> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership >> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined: >> 1 >> >> Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine >> (corosync): [FAILED] >> Sep 21 17:08:21 iron systemd: corosync.service: control process exited, >> code=exited status=1 >> Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine. >> Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state. >> >> >> - on the other host: >> >> Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration... >> Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper. >> Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper. >> Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups. >> Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name >> Lookups. >> Sep 21 17:07:19 cobalt systemd: Starting RPC bind service... >> Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration. >> Sep 21 17:07:19 cobalt systemd: Started RPC bind service. >> Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3 >> locking.... >> Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting >> Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC >> Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3 >> locking.. >> Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server... >> Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server. >> Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit >> capabilities (legacy support in use) >> Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha >> Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request >> from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not >> monitoring any hosts >> Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the >> following cobalt iron >> Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability >> Cluster Manager. >> Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine. >> Sep 21 17:07:20 cobalt systemd: Reloading. >> Sep 21 17:07:20 cobalt systemd: >> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue >> 'RemoveOnStop' in section 'Socket' >> Sep 21 17:07:20 cobalt systemd: >> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue >> 'RemoveOnStop' in section 'Socket' >> Sep 21 17:07:20 cobalt systemd: Reloading. >> Sep 21 17:07:20 cobalt systemd: >> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue >> 'RemoveOnStop' in section 'Socket' >> Sep 21 17:07:20 cobalt systemd: >> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue >> 'RemoveOnStop' in section 'Socket' >> Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine... >> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync Cluster Engine >> ('2.3.4'): started and ready to provide service. >> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync built-in >> features: dbus systemd xmlconf snmp pie relro bindnow >> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport >> (UDP/IP Unicast). >> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing >> transmit/receive security (NSS) crypto: none hash: none >> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface >> [10.100.30.37] is now up. >> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: >> corosync configuration map access [0] >> Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cmap >> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: >> corosync configuration service [1] >> Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cfg >> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: >> corosync cluster closed process group service v1.01 [2] >> Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cpg >> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: >> corosync profile loading service [4] >> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider >> corosync_votequorum >> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: >> corosync vote quorum service v1.0 [5] >> Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: votequorum >> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: >> corosync cluster quorum service v0.1 [3] >> Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: quorum >> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member >> {10.100.30.37} >> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member >> {10.100.30.38} >> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership >> (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members joined: >> 1 >> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 >> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service >> synchronization, ready to provide service. >> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership >> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined: >> 1 >> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster >> members. Current votes: 1 expected_votes: 2 >> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 >> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service >> >> synchronization, ready to provide service. >> Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out. >> Terminating. >> Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine >> (corosync): >> Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine. >> Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed >> state. >> Sep 21 17:08:55 cobalt logger: warning: pcs property set >> no-quorum-policy=ignore failed >> Sep 21 17:08:55 cobalt logger: warning: pcs property set >> stonith-enabled=false failed >> Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start >> ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed >> Sep 21 17:08:56 cobalt logger: warning: pcs resource delete >> nfs_start-clone failed >> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon >> ganesha_mon --clone failed >> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace >> ganesha_grace --clone failed >> Sep 21 17:08:57 cobalt logger: warning pcs resource create >> cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor >> interval=15s failed >> Sep 21 17:08:57 cobalt logger: warning: pcs resource create >> cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed >> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add >> cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed >> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order >> cobalt-trigger_ip-1 then nfs-grace-clone failed >> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order >> nfs-grace-clone then cobalt-cluster_ip-1 failed >> Sep 21 17:08:57 cobalt logger: warning pcs resource create >> iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor >> interval=15s failed >> Sep 21 17:08:57 cobalt logger: warning: pcs resource create >> iron-trigger_ip-1 ocf:heartbeat:Dummy failed >> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add >> iron-cluster_ip-1 with iron-trigger_ip-1 failed >> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order >> iron-trigger_ip-1 then nfs-grace-clone failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint order >> nfs-grace-clone then iron-cluster_ip-1 failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location >> cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location >> cobalt-cluster_ip-1 prefers iron=1000 failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location >> cobalt-cluster_ip-1 prefers cobalt=2000 failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location >> iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location >> iron-cluster_ip-1 prefers cobalt=1000 failed >> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location >> iron-cluster_ip-1 prefers iron=2000 failed >> Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push >> /tmp/tmp.nXTfyA1GMR failed >> Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt >> failed >> >> BTW, I'm using CentOS 7. There are multiple network interfaces on the >> servers, could that be a problem?? >> >> >> >> >> On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at redhat.com >> <mailto:jthottan at redhat.com>> wrote: >> >> >> >> On 21/09/15 13:56, Tiemen Ruiten wrote: >> >>> Hello Soumya, Kaleb, list, >>> >>> This Friday I created the gluster_shared_storage volume manually, >>> I just tried it with the command you supplied, but both have the >>> same result: >>> >>> from etc-glusterfs-glusterd.vol.log on the node where I issued the >>> command: >>> >>> [2015-09-21 07:59:47.756845] I [MSGID: 106474] >>> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >>> host found Hostname is cobalt >>> [2015-09-21 07:59:48.071755] I [MSGID: 106474] >>> [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha >>> host found Hostname is cobalt >>> [2015-09-21 07:59:48.653879] E [MSGID: 106470] >>> [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management: >>> Initial NFS-Ganesha set up failed >>> >> >> As far as what I understand from the logs, it called >> setup_cluser()[calls `ganesha-ha.sh` script ] but script failed. >> Can u please provide following details : >> -Location of ganesha.sh file?? >> -Location of ganesha-ha.conf, ganesha.conf files ? >> >> >> And also can u cross check whether all the prerequisites before HA >> setup satisfied ? >> >> -- >> With Regards, >> Jiffin >> >> >> [2015-09-21 07:59:48.653912] E [MSGID: 106123] >>> [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit >>> of operation 'Volume (null)' failed on localhost : Failed to set >>> up HA config for NFS-Ganesha. Please check the log file for details >>> [2015-09-21 07:59:45.402458] I [MSGID: 106006] >>> [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] >>> 0-management: nfs has disconnected from glusterd. >>> [2015-09-21 07:59:48.071578] I [MSGID: 106474] >>> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >>> host found Hostname is cobalt >>> >>> from etc-glusterfs-glusterd.vol.log on the other node: >>> >>> [2015-09-21 08:12:50.111877] E [MSGID: 106062] >>> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >>> to acquire volname >>> [2015-09-21 08:14:50.548087] E [MSGID: 106062] >>> [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable >>> to acquire volname >>> [2015-09-21 08:14:50.654746] I [MSGID: 106132] >>> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs >>> already stopped >>> [2015-09-21 08:14:50.655095] I [MSGID: 106474] >>> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >>> host found Hostname is cobalt >>> [2015-09-21 08:14:51.287156] E [MSGID: 106062] >>> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >>> to acquire volname >>> >>> >>> from etc-glusterfs-glusterd.vol.log on the arbiter node: >>> >>> [2015-09-21 08:18:50.934713] E [MSGID: 101075] >>> [common-utils.c:3127:gf_is_local_addr] 0-management: error in >>> getaddrinfo: Name or service not known >>> [2015-09-21 08:18:51.504694] E [MSGID: 106062] >>> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >>> to acquire volname >>> >>> I have put the hostnames of all servers in my /etc/hosts file, >>> including the arbiter node. >>> >>> >>> On 18 September 2015 at 16:52, Soumya Koduri <skoduri at redhat.com >>> <mailto:skoduri at redhat.com>> wrote: >>> >>> Hi Tiemen, >>> >>> One of the pre-requisites before setting up nfs-ganesha HA is >>> to create and mount shared_storage volume. Use below CLI for that >>> >>> "gluster volume set all cluster.enable-shared-storage enable" >>> >>> It shall create the volume and mount in all the nodes >>> (including the arbiter node). Note this volume shall be >>> mounted on all the nodes of the gluster storage pool (though >>> in this case it may not be part of nfs-ganesha cluster). >>> >>> So instead of manually creating those directory paths, please >>> use above CLI and try re-configuring the setup. >>> >>> Thanks, >>> Soumya >>> >>> On 09/18/2015 07:29 PM, Tiemen Ruiten wrote: >>> >>> Hello Kaleb, >>> >>> I don't: >>> >>> # Name of the HA cluster created. >>> # must be unique within the subnet >>> HA_NAME="rd-ganesha-ha" >>> # >>> # The gluster server from which to mount the shared data >>> volume. >>> HA_VOL_SERVER="iron" >>> # >>> # N.B. you may use short names or long names; you may not >>> use IP addrs. >>> # Once you select one, stay with it as it will be mildly >>> unpleasant to >>> # clean up if you switch later on. Ensure that all names - >>> short and/or >>> # long - are in DNS or /etc/hosts on all machines in the >>> cluster. >>> # >>> # The subset of nodes of the Gluster Trusted Pool that >>> form the ganesha >>> # HA cluster. Hostname is specified. >>> HA_CLUSTER_NODES="cobalt,iron" >>> #HA_CLUSTER_NODES="server1.lab.redhat.com >>> <http://server1.lab.redhat.com> >>> <http://server1.lab.redhat.com>,server2.lab.redhat.com >>> <http://server2.lab.redhat.com> >>> <http://server2.lab.redhat.com>,..." >>> # >>> # Virtual IPs for each of the nodes specified above. >>> VIP_server1="10.100.30.101" >>> VIP_server2="10.100.30.102" >>> #VIP_server1_lab_redhat_com="10.0.2.1" >>> #VIP_server2_lab_redhat_com="10.0.2.2" >>> >>> hosts cobalt & iron are the data nodes, the arbiter >>> ip/hostname (neon) >>> isn't mentioned anywhere in this config file. >>> >>> >>> On 18 September 2015 at 15:56, Kaleb S. KEITHLEY >>> <<mailto:kkeithle at redhat.com>kkeithle at redhat.com >>> <mailto:kkeithle at redhat.com> >>> <mailto:kkeithle at redhat.com <mailto:kkeithle at redhat.com>>> >>> wrote: >>> >>> ? ? On 09/18/2015 09:46 AM, Tiemen Ruiten wrote: >>> ? ? > Hello, >>> ? ? > >>> ? ? > I have a Gluster cluster with a single replica 3, >>> arbiter 1 volume (so >>> ? ? > two nodes with actual data, one arbiter node). I >>> would like to setup >>> ? ? > NFS-Ganesha HA for this volume but I'm having some >>> difficulties. >>> ? ? > >>> ? ? > - I needed to create a directory >>> /var/run/gluster/shared_storage >>> ? ? > manually on all nodes, or the command 'gluster >>> nfs-ganesha enable would >>> ? ? > fail with the following error: >>> ? ? > [2015-09-18 13:13:34.690416] E [MSGID: 106032] >>> ? ? > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name: >>> mkdir() failed on path >>> ? ? > /var/run/gluster/shared_storage/nfs-ganesha, [No >>> such file or directory] >>> ? ? > >>> ? ? > - Then I found out that the command connects to >>> the arbiter node as >>> ? ? > well, but obviously I don't want to set up >>> NFS-Ganesha there. Is it >>> ? ? > actually possible to setup NFS-Ganesha HA with an >>> arbiter node? If it's >>> ? ? > possible, is there any documentation on how to do >>> that? >>> ? ? > >>> >>> ? ? Please send the /etc/ganesha/ganesha-ha.conf file >>> you're using. >>> >>> ? ? Probably you have included the arbiter in your HA >>> config; that would be >>> ? ? a mistake. >>> >>> ? ? -- >>> >>> ? ? Kaleb >>> >>> >>> >>> >>> -- >>> Tiemen Ruiten >>> Systems Engineer >>> R&D Media >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> -- >>> Tiemen Ruiten >>> Systems Engineer >>> R&D Media >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> >> >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> >>-- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150922/353c786a/attachment.html>