Soumya Koduri
2015-Sep-22  07:04 UTC
[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume
Hi Tiemen, Have added the steps to configure HA NFS in the below doc. Please verify if you have all the pre-requisites done & steps performed right. https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md Thanks, Soumya On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:> Whoops, replied off-list. > > Additionally I noticed that the generated corosync config is not valid, > as there is no interface section: > > /etc/corosync/corosync.conf > > totem { > version: 2 > secauth: off > cluster_name: rd-ganesha-ha > transport: udpu > } > > nodelist { > ? node { > ? ? ? ? ring0_addr: cobalt > ? ? ? ? nodeid: 1 > ? ? ? ? } > ? node { > ? ? ? ? ring0_addr: iron > ? ? ? ? nodeid: 2 > ? ? ? ? } > } > > quorum { > provider: corosync_votequorum > two_node: 1 > } > > logging { > to_syslog: yes > } > > > > > ---------- Forwarded message ---------- > From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten at rdmedia.com>> > Date: 21 September 2015 at 17:16 > Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume > To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan at redhat.com>> > > > Could you point me to the latest documentation? I've been struggling to > find something up-to-date. I believe I have all the prerequisites: > > - shared storage volume exists and is mounted > - all nodes in hosts files > - Gluster-NFS disabled > - corosync, pacemaker and nfs-ganesha rpm's installed > > Anything I missed? > > Everything has been installed by RPM so is in the default locations: > /usr/libexec/ganesha/ganesha-ha.sh > /etc/ganesha/ganesha.conf (empty) > /etc/ganesha/ganesha-ha.conf > > After I started the pcsd service manually, nfs-ganesha could be enabled > successfully, but there was no virtual IP present on the interfaces and > looking at the system log, I noticed corosync failed to start: > > - on the host where I issued the gluster nfs-ganesha enable command: > > Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server... > Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server. > Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from > iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not monitoring > any hosts > Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine... > Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync Cluster Engine > ('2.3.4'): started and ready to provide service. > Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync built-in > features: dbus systemd xmlconf snmp pie relro bindnow > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport > (UDP/IP Unicast). > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing > transmit/receive security (NSS) crypto: none hash: none > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface > [10.100.30.38] is now up. > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync configuration map access [0] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cmap > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync configuration service [1] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cfg > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync cluster closed process group service v1.01 [2] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: cpg > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync profile loading service [4] > Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider > corosync_votequorum > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync vote quorum service v1.0 [5] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: votequorum > Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded: > corosync cluster quorum service v0.1 [3] > Sep 21 17:07:20 iron corosync[3427]: [QB ? ? ] server name: quorum > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member > {10.100.30.38} > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member > {10.100.30.37} > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership > (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members joined: 1 > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1 > Sep 21 17:07:20 iron corosync[3427]: [MAIN ? ] Completed service > synchronization, ready to provide service. > Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership > (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined: 1 > Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine > (corosync): [FAILED] > Sep 21 17:08:21 iron systemd: corosync.service: control process exited, > code=exited status=1 > Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine. > Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state. > > > - on the other host: > > Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration... > Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper. > Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper. > Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups. > Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name > Lookups. > Sep 21 17:07:19 cobalt systemd: Starting RPC bind service... > Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration. > Sep 21 17:07:19 cobalt systemd: Started RPC bind service. > Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3 > locking.... > Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting > Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC > Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3 > locking.. > Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server... > Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server. > Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit > capabilities (legacy support in use) > Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha > Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request > from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not > monitoring any hosts > Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the > following cobalt iron > Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability > Cluster Manager. > Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine. > Sep 21 17:07:20 cobalt systemd: Reloading. > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: Reloading. > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: > [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue > 'RemoveOnStop' in section 'Socket' > Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine... > Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync Cluster Engine > ('2.3.4'): started and ready to provide service. > Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync built-in > features: dbus systemd xmlconf snmp pie relro bindnow > Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport > (UDP/IP Unicast). > Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing > transmit/receive security (NSS) crypto: none hash: none > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface > [10.100.30.37] is now up. > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync configuration map access [0] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cmap > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync configuration service [1] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cfg > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync cluster closed process group service v1.01 [2] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: cpg > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync profile loading service [4] > Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider > corosync_votequorum > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync vote quorum service v1.0 [5] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: votequorum > Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded: > corosync cluster quorum service v0.1 [3] > Sep 21 17:07:21 cobalt corosync[2817]: [QB ? ? ] server name: quorum > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member > {10.100.30.37} > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member > {10.100.30.38} > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership > (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members joined: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service > synchronization, ready to provide service. > Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership > (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members joined: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster > members. Current votes: 1 expected_votes: 2 > Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 > Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service > synchronization, ready to provide service. > Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out. > Terminating. > Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine > (corosync): > Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine. > Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed state. > Sep 21 17:08:55 cobalt logger: warning: pcs property set > no-quorum-policy=ignore failed > Sep 21 17:08:55 cobalt logger: warning: pcs property set > stonith-enabled=false failed > Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start > ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed > Sep 21 17:08:56 cobalt logger: warning: pcs resource delete > nfs_start-clone failed > Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon > ganesha_mon --clone failed > Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace > ganesha_grace --clone failed > Sep 21 17:08:57 cobalt logger: warning pcs resource create > cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor > interval=15s failed > Sep 21 17:08:57 cobalt logger: warning: pcs resource create > cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add > cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint order > cobalt-trigger_ip-1 then nfs-grace-clone failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint order > nfs-grace-clone then cobalt-cluster_ip-1 failed > Sep 21 17:08:57 cobalt logger: warning pcs resource create > iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor > interval=15s failed > Sep 21 17:08:57 cobalt logger: warning: pcs resource create > iron-trigger_ip-1 ocf:heartbeat:Dummy failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add > iron-cluster_ip-1 with iron-trigger_ip-1 failed > Sep 21 17:08:57 cobalt logger: warning: pcs constraint order > iron-trigger_ip-1 then nfs-grace-clone failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint order > nfs-grace-clone then iron-cluster_ip-1 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > cobalt-cluster_ip-1 prefers iron=1000 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > cobalt-cluster_ip-1 prefers cobalt=2000 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > iron-cluster_ip-1 prefers cobalt=1000 failed > Sep 21 17:08:58 cobalt logger: warning: pcs constraint location > iron-cluster_ip-1 prefers iron=2000 failed > Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push > /tmp/tmp.nXTfyA1GMR failed > Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt failed > > BTW, I'm using CentOS 7. There are multiple network interfaces on the > servers, could that be a problem?? > > > > > On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at redhat.com > <mailto:jthottan at redhat.com>> wrote: > > > > On 21/09/15 13:56, Tiemen Ruiten wrote: >> Hello Soumya, Kaleb, list, >> >> This Friday I created the gluster_shared_storage volume manually, >> I just tried it with the command you supplied, but both have the >> same result: >> >> from etc-glusterfs-glusterd.vol.log on the node where I issued the >> command: >> >> [2015-09-21 07:59:47.756845] I [MSGID: 106474] >> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >> host found Hostname is cobalt >> [2015-09-21 07:59:48.071755] I [MSGID: 106474] >> [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha >> host found Hostname is cobalt >> [2015-09-21 07:59:48.653879] E [MSGID: 106470] >> [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management: >> Initial NFS-Ganesha set up failed > > As far as what I understand from the logs, it called > setup_cluser()[calls `ganesha-ha.sh` script ] but script failed. > Can u please provide following details : > -Location of ganesha.sh file?? > -Location of ganesha-ha.conf, ganesha.conf files ? > > > And also can u cross check whether all the prerequisites before HA > setup satisfied ? > > -- > With Regards, > Jiffin > > >> [2015-09-21 07:59:48.653912] E [MSGID: 106123] >> [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit >> of operation 'Volume (null)' failed on localhost : Failed to set >> up HA config for NFS-Ganesha. Please check the log file for details >> [2015-09-21 07:59:45.402458] I [MSGID: 106006] >> [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] >> 0-management: nfs has disconnected from glusterd. >> [2015-09-21 07:59:48.071578] I [MSGID: 106474] >> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >> host found Hostname is cobalt >> >> from etc-glusterfs-glusterd.vol.log on the other node: >> >> [2015-09-21 08:12:50.111877] E [MSGID: 106062] >> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >> to acquire volname >> [2015-09-21 08:14:50.548087] E [MSGID: 106062] >> [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable >> to acquire volname >> [2015-09-21 08:14:50.654746] I [MSGID: 106132] >> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs >> already stopped >> [2015-09-21 08:14:50.655095] I [MSGID: 106474] >> [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha >> host found Hostname is cobalt >> [2015-09-21 08:14:51.287156] E [MSGID: 106062] >> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >> to acquire volname >> >> >> from etc-glusterfs-glusterd.vol.log on the arbiter node: >> >> [2015-09-21 08:18:50.934713] E [MSGID: 101075] >> [common-utils.c:3127:gf_is_local_addr] 0-management: error in >> getaddrinfo: Name or service not known >> [2015-09-21 08:18:51.504694] E [MSGID: 106062] >> [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable >> to acquire volname >> >> I have put the hostnames of all servers in my /etc/hosts file, >> including the arbiter node. >> >> >> On 18 September 2015 at 16:52, Soumya Koduri <skoduri at redhat.com >> <mailto:skoduri at redhat.com>> wrote: >> >> Hi Tiemen, >> >> One of the pre-requisites before setting up nfs-ganesha HA is >> to create and mount shared_storage volume. Use below CLI for that >> >> "gluster volume set all cluster.enable-shared-storage enable" >> >> It shall create the volume and mount in all the nodes >> (including the arbiter node). Note this volume shall be >> mounted on all the nodes of the gluster storage pool (though >> in this case it may not be part of nfs-ganesha cluster). >> >> So instead of manually creating those directory paths, please >> use above CLI and try re-configuring the setup. >> >> Thanks, >> Soumya >> >> On 09/18/2015 07:29 PM, Tiemen Ruiten wrote: >> >> Hello Kaleb, >> >> I don't: >> >> # Name of the HA cluster created. >> # must be unique within the subnet >> HA_NAME="rd-ganesha-ha" >> # >> # The gluster server from which to mount the shared data >> volume. >> HA_VOL_SERVER="iron" >> # >> # N.B. you may use short names or long names; you may not >> use IP addrs. >> # Once you select one, stay with it as it will be mildly >> unpleasant to >> # clean up if you switch later on. Ensure that all names - >> short and/or >> # long - are in DNS or /etc/hosts on all machines in the >> cluster. >> # >> # The subset of nodes of the Gluster Trusted Pool that >> form the ganesha >> # HA cluster. Hostname is specified. >> HA_CLUSTER_NODES="cobalt,iron" >> #HA_CLUSTER_NODES="server1.lab.redhat.com >> <http://server1.lab.redhat.com> >> <http://server1.lab.redhat.com>,server2.lab.redhat.com >> <http://server2.lab.redhat.com> >> <http://server2.lab.redhat.com>,..." >> # >> # Virtual IPs for each of the nodes specified above. >> VIP_server1="10.100.30.101" >> VIP_server2="10.100.30.102" >> #VIP_server1_lab_redhat_com="10.0.2.1" >> #VIP_server2_lab_redhat_com="10.0.2.2" >> >> hosts cobalt & iron are the data nodes, the arbiter >> ip/hostname (neon) >> isn't mentioned anywhere in this config file. >> >> >> On 18 September 2015 at 15:56, Kaleb S. KEITHLEY >> <<mailto:kkeithle at redhat.com>kkeithle at redhat.com >> <mailto:kkeithle at redhat.com> >> <mailto:kkeithle at redhat.com <mailto:kkeithle at redhat.com>>> >> wrote: >> >> ? ? On 09/18/2015 09:46 AM, Tiemen Ruiten wrote: >> ? ? > Hello, >> ? ? > >> ? ? > I have a Gluster cluster with a single replica 3, >> arbiter 1 volume (so >> ? ? > two nodes with actual data, one arbiter node). I >> would like to setup >> ? ? > NFS-Ganesha HA for this volume but I'm having some >> difficulties. >> ? ? > >> ? ? > - I needed to create a directory >> /var/run/gluster/shared_storage >> ? ? > manually on all nodes, or the command 'gluster >> nfs-ganesha enable would >> ? ? > fail with the following error: >> ? ? > [2015-09-18 13:13:34.690416] E [MSGID: 106032] >> ? ? > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name: >> mkdir() failed on path >> ? ? > /var/run/gluster/shared_storage/nfs-ganesha, [No >> such file or directory] >> ? ? > >> ? ? > - Then I found out that the command connects to >> the arbiter node as >> ? ? > well, but obviously I don't want to set up >> NFS-Ganesha there. Is it >> ? ? > actually possible to setup NFS-Ganesha HA with an >> arbiter node? If it's >> ? ? > possible, is there any documentation on how to do >> that? >> ? ? > >> >> ? ? Please send the /etc/ganesha/ganesha-ha.conf file >> you're using. >> >> ? ? Probably you have included the arbiter in your HA >> config; that would be >> ? ? a mistake. >> >> ? ? -- >> >> ? ? Kaleb >> >> >> >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media > > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Tiemen Ruiten
2015-Sep-22  09:05 UTC
[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume
I missed having passwordless SSH auth for the root user. However it did not
make a difference:
After verifying prerequisites, issued gluster nfs-ganesha enable on node
cobalt:
Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration...
Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups.
Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name
Lookups.
Sep 22 10:19:56 cobalt systemd: Starting RPC bind service...
Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration.
Sep 22 10:19:56 cobalt systemd: Started RPC bind service.
Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3
locking....
Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting
Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC
Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3
locking..
Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server...
Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server.
Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
capabilities (legacy support in use)
Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request from
cobalt.int.rdmedia.com while not monitoring any hosts
Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha
Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the
following cobalt iron
Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability Cluster
Manager.
Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine.
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine...
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync Cluster Engine
('2.3.4'): started and ready to provide service.
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync built-in features:
dbus systemd xmlconf snmp pie relro bindnow
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface
[10.100.30.37] is now up.
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration map access [0]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cmap
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration service [1]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cfg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cpg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync profile loading service [4]
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider
corosync_votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: quorum
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.37}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.38}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:140) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:148) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]:
Sep 22 10:19:58 cobalt corosync[2816]: [MAIN  ] Completed service
synchronization, ready to provide service.
*Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out.
Terminating.*
*Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine
(corosync):*
*Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.*
*Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed
state.*
Sep 22 10:21:32 cobalt logger: warning: pcs property set
no-quorum-policy=ignore failed
Sep 22 10:21:32 cobalt logger: warning: pcs property set
stonith-enabled=false failed
Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start
ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource delete nfs_start-clone
failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon
ganesha_mon --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace
ganesha_grace --clone failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101 cidr_netmask=32
op monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
cobalt-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
nfs-grace-clone then cobalt-cluster_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32 op
monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
iron-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
iron-cluster_ip-1 with iron-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
iron-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint order
nfs-grace-clone then iron-cluster_ip-1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers iron=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers cobalt=2000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers cobalt=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers iron=2000 failed
Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push
/tmp/tmp.yqLT4m75WG failed
Notice the failed corosync service in bold. I can't find any logs pointing
to a reason. Starting it manually is not a problem:
Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine
(corosync): [  OK  ]
Then I noticed pacemaker was not running on both nodes. Started it manually
and saw the following in /var/log/messages on the other node:
Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin
--replace -o configuration -V --xml-pipe
Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore
Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled since
no STONITH resources have been defined
Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or disable
STONITH with the stonith-enabled option
Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared data
need STONITH to ensure data integrity
Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations
until there are resources to manage
Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean!
Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes
until quorum is attained (or no-quorum-policy is set to ignore)
Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-warn-20.bz2
Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found
during PE processing.  Please run "crm_verify -L" to identify issues.
Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete
Sep 22 10:36:44 iron crmd[4617]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]
I'm starting to think there is some leftover config somewhere from all
these attempts. Is there a way to completely reset all config related to
NFS-Ganesha and start over?
On 22 September 2015 at 09:04, Soumya Koduri <skoduri at redhat.com>
wrote:
> Hi Tiemen,
>
> Have added the steps to configure HA NFS in the below doc. Please verify
> if you have all the pre-requisites done & steps performed right.
>
>
>
https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md
>
> Thanks,
> Soumya
>
> On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:
>
>> Whoops, replied off-list.
>>
>> Additionally I noticed that the generated corosync config is not valid,
>> as there is no interface section:
>>
>> /etc/corosync/corosync.conf
>>
>> totem {
>> version: 2
>> secauth: off
>> cluster_name: rd-ganesha-ha
>> transport: udpu
>> }
>>
>> nodelist {
>> ?  node {
>> ?  ?  ?  ?  ring0_addr: cobalt
>> ?  ?  ?  ?  nodeid: 1
>> ?  ?  ?  ? }
>> ?  node {
>> ?  ?  ?  ?  ring0_addr: iron
>> ?  ?  ?  ?  nodeid: 2
>> ?  ?  ?  ? }
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> }
>>
>> logging {
>> to_syslog: yes
>> }
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten
at rdmedia.com
>> >>
>> Date: 21 September 2015 at 17:16
>> Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume
>> To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan
at redhat.com
>> >>
>>
>>
>> Could you point me to the latest documentation? I've been
struggling to
>> find something up-to-date. I believe I have all the prerequisites:
>>
>> - shared storage volume exists and is mounted
>> - all nodes in hosts files
>> - Gluster-NFS disabled
>> - corosync, pacemaker and nfs-ganesha rpm's installed
>>
>> Anything I missed?
>>
>> Everything has been installed by RPM so is in the default locations:
>> /usr/libexec/ganesha/ganesha-ha.sh
>> /etc/ganesha/ganesha.conf (empty)
>> /etc/ganesha/ganesha-ha.conf
>>
>> After I started the pcsd service manually, nfs-ganesha could be enabled
>> successfully, but there was no virtual IP present on the interfaces and
>> looking at the system log, I noticed corosync failed to start:
>>
>> - on the host where I issued the gluster nfs-ganesha enable command:
>>
>> Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request
from
>> iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not
monitoring
>> any hosts
>> Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync Cluster Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface
>> [10.100.30.38] is now up.
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cmap
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cfg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cpg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: quorum
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members
joined:
>> 1
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:20 iron corosync[3427]: [MAIN ? ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members
joined:
>> 1
>>
>> Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine
>> (corosync): [FAILED]
>> Sep 21 17:08:21 iron systemd: corosync.service: control process exited,
>> code=exited status=1
>> Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine.
>> Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed
state.
>>
>>
>> - on the other host:
>>
>> Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS
configuration...
>> Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups.
>> Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name
>> Lookups.
>> Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...
>> Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration.
>> Sep 21 17:07:19 cobalt systemd: Started RPC bind service.
>> Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3
>> locking....
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC
>> Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3
>> locking..
>> Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
>> capabilities (legacy support in use)
>> Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request
>> from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while
not
>> monitoring any hosts
>> Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with
the
>> following cobalt iron
>> Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability
>> Cluster Manager.
>> Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync Cluster
Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface
>> [10.100.30.37] is now up.
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cmap
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cfg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cpg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name:
votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: quorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members
joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members
joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service
>>
>> synchronization, ready to provide service.
>> Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out.
>> Terminating.
>> Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine
>> (corosync):
>> Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster
Engine.
>> Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed
>> state.
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> no-quorum-policy=ignore failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> stonith-enabled=false failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start
>> ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource delete
>> nfs_start-clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon
>> ganesha_mon --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace
>> ganesha_grace --clone failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> cobalt-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then cobalt-cluster_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> iron-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> iron-cluster_ip-1 with iron-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> iron-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then iron-cluster_ip-1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers iron=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers cobalt=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers cobalt=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers iron=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push
>> /tmp/tmp.nXTfyA1GMR failed
>> Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt
>> failed
>>
>> BTW, I'm using CentOS 7. There are multiple network interfaces on
the
>> servers, could that be a problem??
>>
>>
>>
>>
>> On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at
redhat.com
>> <mailto:jthottan at redhat.com>> wrote:
>>
>>
>>
>>     On 21/09/15 13:56, Tiemen Ruiten wrote:
>>
>>>     Hello Soumya, Kaleb, list,
>>>
>>>     This Friday I created the gluster_shared_storage volume
manually,
>>>     I just tried it with the command you supplied, but both have
the
>>>     same result:
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the node where I issued
the
>>>     command:
>>>
>>>     [2015-09-21 07:59:47.756845] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 07:59:48.071755] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 07:59:48.653879] E [MSGID: 106470]
>>>     [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management:
>>>     Initial NFS-Ganesha set up failed
>>>
>>
>>     As far as what I understand from the logs, it called
>>     setup_cluser()[calls `ganesha-ha.sh` script ] but script failed.
>>     Can u please provide following details :
>>     -Location of ganesha.sh file??
>>     -Location of ganesha-ha.conf, ganesha.conf files ?
>>
>>
>>     And also can u cross check whether all the prerequisites before HA
>>     setup satisfied ?
>>
>>     --
>>     With Regards,
>>     Jiffin
>>
>>
>>     [2015-09-21 07:59:48.653912] E [MSGID: 106123]
>>>     [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management:
Commit
>>>     of operation 'Volume (null)' failed on localhost :
Failed to set
>>>     up HA config for NFS-Ganesha. Please check the log file for
details
>>>     [2015-09-21 07:59:45.402458] I [MSGID: 106006]
>>>     [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]
>>>     0-management: nfs has disconnected from glusterd.
>>>     [2015-09-21 07:59:48.071578] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the other node:
>>>
>>>     [2015-09-21 08:12:50.111877] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management:
Unable
>>>     to acquire volname
>>>     [2015-09-21 08:14:50.548087] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management:
Unable
>>>     to acquire volname
>>>     [2015-09-21 08:14:50.654746] I [MSGID: 106132]
>>>     [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>     already stopped
>>>     [2015-09-21 08:14:50.655095] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 08:14:51.287156] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management:
Unable
>>>     to acquire volname
>>>
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the arbiter node:
>>>
>>>     [2015-09-21 08:18:50.934713] E [MSGID: 101075]
>>>     [common-utils.c:3127:gf_is_local_addr] 0-management: error in
>>>     getaddrinfo: Name or service not known
>>>     [2015-09-21 08:18:51.504694] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management:
Unable
>>>     to acquire volname
>>>
>>>     I have put the hostnames of all servers in my /etc/hosts file,
>>>     including the arbiter node.
>>>
>>>
>>>     On 18 September 2015 at 16:52, Soumya Koduri <skoduri at
redhat.com
>>>     <mailto:skoduri at redhat.com>> wrote:
>>>
>>>         Hi Tiemen,
>>>
>>>         One of the pre-requisites before setting up nfs-ganesha HA
is
>>>         to create and mount shared_storage volume. Use below CLI
for that
>>>
>>>         "gluster volume set all cluster.enable-shared-storage
enable"
>>>
>>>         It shall create the volume and mount in all the nodes
>>>         (including the arbiter node). Note this volume shall be
>>>         mounted on all the nodes of the gluster storage pool
(though
>>>         in this case it may not be part of nfs-ganesha cluster).
>>>
>>>         So instead of manually creating those directory paths,
please
>>>         use above CLI and try re-configuring the setup.
>>>
>>>         Thanks,
>>>         Soumya
>>>
>>>         On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:
>>>
>>>             Hello Kaleb,
>>>
>>>             I don't:
>>>
>>>             # Name of the HA cluster created.
>>>             # must be unique within the subnet
>>>             HA_NAME="rd-ganesha-ha"
>>>             #
>>>             # The gluster server from which to mount the shared
data
>>>             volume.
>>>             HA_VOL_SERVER="iron"
>>>             #
>>>             # N.B. you may use short names or long names; you may
not
>>>             use IP addrs.
>>>             # Once you select one, stay with it as it will be
mildly
>>>             unpleasant to
>>>             # clean up if you switch later on. Ensure that all
names -
>>>             short and/or
>>>             # long - are in DNS or /etc/hosts on all machines in
the
>>>             cluster.
>>>             #
>>>             # The subset of nodes of the Gluster Trusted Pool that
>>>             form the ganesha
>>>             # HA cluster. Hostname is specified.
>>>             HA_CLUSTER_NODES="cobalt,iron"
>>>             #HA_CLUSTER_NODES="server1.lab.redhat.com
>>>             <http://server1.lab.redhat.com>
>>>            
<http://server1.lab.redhat.com>,server2.lab.redhat.com
>>>             <http://server2.lab.redhat.com>
>>>             <http://server2.lab.redhat.com>,..."
>>>             #
>>>             # Virtual IPs for each of the nodes specified above.
>>>             VIP_server1="10.100.30.101"
>>>             VIP_server2="10.100.30.102"
>>>             #VIP_server1_lab_redhat_com="10.0.2.1"
>>>             #VIP_server2_lab_redhat_com="10.0.2.2"
>>>
>>>             hosts cobalt & iron are the data nodes, the arbiter
>>>             ip/hostname (neon)
>>>             isn't mentioned anywhere in this config file.
>>>
>>>
>>>             On 18 September 2015 at 15:56, Kaleb S. KEITHLEY
>>>             <<mailto:kkeithle at redhat.com>kkeithle at
redhat.com
>>>             <mailto:kkeithle at redhat.com>
>>>             <mailto:kkeithle at redhat.com <mailto:kkeithle
at redhat.com>>>
>>>             wrote:
>>>
>>>             ?  ?  On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:
>>>             ?  ?  > Hello,
>>>             ?  ?  >
>>>             ?  ?  > I have a Gluster cluster with a single
replica 3,
>>>             arbiter 1 volume (so
>>>             ?  ?  > two nodes with actual data, one arbiter
node). I
>>>             would like to setup
>>>             ?  ?  > NFS-Ganesha HA for this volume but I'm
having some
>>>             difficulties.
>>>             ?  ?  >
>>>             ?  ?  > - I needed to create a directory
>>>             /var/run/gluster/shared_storage
>>>             ?  ?  > manually on all nodes, or the command
'gluster
>>>             nfs-ganesha enable would
>>>             ?  ?  > fail with the following error:
>>>             ?  ?  > [2015-09-18 13:13:34.690416] E [MSGID:
106032]
>>>             ?  ?  > [glusterd-ganesha.c:708:pre_setup]
0-THIS->name:
>>>             mkdir() failed on path
>>>             ?  ?  > /var/run/gluster/shared_storage/nfs-ganesha,
[No
>>>             such file or directory]
>>>             ?  ?  >
>>>             ?  ?  > - Then I found out that the command connects
to
>>>             the arbiter node as
>>>             ?  ?  > well, but obviously I don't want to set
up
>>>             NFS-Ganesha there. Is it
>>>             ?  ?  > actually possible to setup NFS-Ganesha HA
with an
>>>             arbiter node? If it's
>>>             ?  ?  > possible, is there any documentation on how
to do
>>>             that?
>>>             ?  ?  >
>>>
>>>             ?  ?  Please send the /etc/ganesha/ganesha-ha.conf file
>>>             you're using.
>>>
>>>             ?  ?  Probably you have included the arbiter in your HA
>>>             config; that would be
>>>             ?  ?  a mistake.
>>>
>>>             ?  ?  --
>>>
>>>             ?  ?  Kaleb
>>>
>>>
>>>
>>>
>>>             --
>>>             Tiemen Ruiten
>>>             Systems Engineer
>>>             R&D Media
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>     --
>>>     Tiemen Ruiten
>>>     Systems Engineer
>>>     R&D Media
>>>
>>>
>>>     _______________________________________________
>>>     Gluster-users mailing list
>>>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
-- 
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150922/353c786a/attachment.html>