thr3ads.net - Gluster users - [Gluster-users] Fwd: nfs-ganesha HA with arbiter volume [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Soumya Koduri

2015-Sep-22 07:04 UTC

[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume

Hi Tiemen,

Have added the steps to configure HA NFS in the below doc. Please verify 
if you have all the pre-requisites done & steps performed right.

https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md

Thanks,
Soumya

On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:> Whoops, replied off-list.
>
> Additionally I noticed that the generated corosync config is not valid,
> as there is no interface section:
>
> /etc/corosync/corosync.conf
>
> totem {
> version: 2
> secauth: off
> cluster_name: rd-ganesha-ha
> transport: udpu
> }
>
> nodelist {
> ?  node {
> ?  ?  ?  ?  ring0_addr: cobalt
> ?  ?  ?  ?  nodeid: 1
> ?  ?  ?  ? }
> ?  node {
> ?  ?  ?  ?  ring0_addr: iron
> ?  ?  ?  ?  nodeid: 2
> ?  ?  ?  ? }
> }
>
> quorum {
> provider: corosync_votequorum
> two_node: 1
> }
>
> logging {
> to_syslog: yes
> }
>
>
>
>
> ---------- Forwarded message ----------
> From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten at
rdmedia.com>>
> Date: 21 September 2015 at 17:16
> Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume
> To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan at
redhat.com>>
>
>
> Could you point me to the latest documentation? I've been struggling to
> find something up-to-date. I believe I have all the prerequisites:
>
> - shared storage volume exists and is mounted
> - all nodes in hosts files
> - Gluster-NFS disabled
> - corosync, pacemaker and nfs-ganesha rpm's installed
>
> Anything I missed?
>
> Everything has been installed by RPM so is in the default locations:
> /usr/libexec/ganesha/ganesha-ha.sh
> /etc/ganesha/ganesha.conf (empty)
> /etc/ganesha/ganesha-ha.conf
>
> After I started the pcsd service manually, nfs-ganesha could be enabled
> successfully, but there was no virtual IP present on the interfaces and
> looking at the system log, I noticed corosync failed to start:
>
> - on the host where I issued the gluster nfs-ganesha enable command:
>
> Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...
> Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.
> Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from
> iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not
monitoring
> any hosts
> Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...
> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync Cluster Engine
> ('2.3.4'): started and ready to provide service.
> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync built-in
> features: dbus systemd xmlconf snmp pie relro bindnow
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport
> (UDP/IP Unicast).
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: none hash: none
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface
> [10.100.30.38] is now up.
> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
> corosync configuration map access [0]
> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cmap
> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
> corosync configuration service [1]
> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cfg
> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
> corosync cluster closed process group service v1.01 [2]
> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cpg
> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
> corosync profile loading service [4]
> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider
> corosync_votequorum
> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
> corosync vote quorum service v1.0 [5]
> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: votequorum
> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
> corosync cluster quorum service v0.1 [3]
> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: quorum
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
> {10.100.30.38}
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
> {10.100.30.37}
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
> (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members
joined: 1
> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1
> Sep 21 17:07:20 iron corosync[3427]: [MAIN ? ] Completed service
> synchronization, ready to provide service.
> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members
joined: 1
> Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine
> (corosync): [FAILED]
> Sep 21 17:08:21 iron systemd: corosync.service: control process exited,
> code=exited status=1
> Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine.
> Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state.
>
>
> - on the other host:
>
> Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration...
> Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.
> Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.
> Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups.
> Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name
> Lookups.
> Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...
> Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration.
> Sep 21 17:07:19 cobalt systemd: Started RPC bind service.
> Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3
> locking....
> Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting
> Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC
> Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3
> locking..
> Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...
> Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.
> Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
> capabilities (legacy support in use)
> Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha
> Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request
> from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while not
> monitoring any hosts
> Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the
> following cobalt iron
> Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability
> Cluster Manager.
> Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.
> Sep 21 17:07:20 cobalt systemd: Reloading.
> Sep 21 17:07:20 cobalt systemd:
> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
> 'RemoveOnStop' in section 'Socket'
> Sep 21 17:07:20 cobalt systemd:
> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
> 'RemoveOnStop' in section 'Socket'
> Sep 21 17:07:20 cobalt systemd: Reloading.
> Sep 21 17:07:20 cobalt systemd:
> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
> 'RemoveOnStop' in section 'Socket'
> Sep 21 17:07:20 cobalt systemd:
> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
> 'RemoveOnStop' in section 'Socket'
> Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...
> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync Cluster Engine
> ('2.3.4'): started and ready to provide service.
> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync built-in
> features: dbus systemd xmlconf snmp pie relro bindnow
> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport
> (UDP/IP Unicast).
> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: none hash: none
> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface
> [10.100.30.37] is now up.
> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
> corosync configuration map access [0]
> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cmap
> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
> corosync configuration service [1]
> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cfg
> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
> corosync cluster closed process group service v1.01 [2]
> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cpg
> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
> corosync profile loading service [4]
> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider
> corosync_votequorum
> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
> corosync vote quorum service v1.0 [5]
> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: votequorum
> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
> corosync cluster quorum service v0.1 [3]
> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: quorum
> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
> {10.100.30.37}
> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
> {10.100.30.38}
> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
> (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members
joined: 1
> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service
> synchronization, ready to provide service.
> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members
joined: 1
> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
> members. Current votes: 1 expected_votes: 2
> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service
> synchronization, ready to provide service.
> Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out.
> Terminating.
> Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine
> (corosync):
> Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine.
> Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed state.
> Sep 21 17:08:55 cobalt logger: warning: pcs property set
> no-quorum-policy=ignore failed
> Sep 21 17:08:55 cobalt logger: warning: pcs property set
> stonith-enabled=false failed
> Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start
> ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
> Sep 21 17:08:56 cobalt logger: warning: pcs resource delete
> nfs_start-clone failed
> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon
> ganesha_mon --clone failed
> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace
> ganesha_grace --clone failed
> Sep 21 17:08:57 cobalt logger: warning pcs resource create
> cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
> interval=15s failed
> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
> cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
> cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
> cobalt-trigger_ip-1 then nfs-grace-clone failed
> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
> nfs-grace-clone then cobalt-cluster_ip-1 failed
> Sep 21 17:08:57 cobalt logger: warning pcs resource create
> iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
> interval=15s failed
> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
> iron-trigger_ip-1 ocf:heartbeat:Dummy failed
> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
> iron-cluster_ip-1 with iron-trigger_ip-1 failed
> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
> iron-trigger_ip-1 then nfs-grace-clone failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint order
> nfs-grace-clone then iron-cluster_ip-1 failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
> cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
> cobalt-cluster_ip-1 prefers iron=1000 failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
> cobalt-cluster_ip-1 prefers cobalt=2000 failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
> iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
> iron-cluster_ip-1 prefers cobalt=1000 failed
> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
> iron-cluster_ip-1 prefers iron=2000 failed
> Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push
> /tmp/tmp.nXTfyA1GMR failed
> Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt
failed
>
> BTW, I'm using CentOS 7. There are multiple network interfaces on the
> servers, could that be a problem??
>
>
>
>
> On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at
redhat.com
> <mailto:jthottan at redhat.com>> wrote:
>
>
>
>     On 21/09/15 13:56, Tiemen Ruiten wrote:
>>     Hello Soumya, Kaleb, list,
>>
>>     This Friday I created the gluster_shared_storage volume manually,
>>     I just tried it with the command you supplied, but both have the
>>     same result:
>>
>>     from etc-glusterfs-glusterd.vol.log on the node where I issued the
>>     command:
>>
>>     [2015-09-21 07:59:47.756845] I [MSGID: 106474]
>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>     host found Hostname is cobalt
>>     [2015-09-21 07:59:48.071755] I [MSGID: 106474]
>>     [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha
>>     host found Hostname is cobalt
>>     [2015-09-21 07:59:48.653879] E [MSGID: 106470]
>>     [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management:
>>     Initial NFS-Ganesha set up failed
>
>     As far as what I understand from the logs, it called
>     setup_cluser()[calls `ganesha-ha.sh` script ] but script failed.
>     Can u please provide following details :
>     -Location of ganesha.sh file??
>     -Location of ganesha-ha.conf, ganesha.conf files ?
>
>
>     And also can u cross check whether all the prerequisites before HA
>     setup satisfied ?
>
>     --
>     With Regards,
>     Jiffin
>
>
>>     [2015-09-21 07:59:48.653912] E [MSGID: 106123]
>>     [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit
>>     of operation 'Volume (null)' failed on localhost : Failed
to set
>>     up HA config for NFS-Ganesha. Please check the log file for details
>>     [2015-09-21 07:59:45.402458] I [MSGID: 106006]
>>     [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]
>>     0-management: nfs has disconnected from glusterd.
>>     [2015-09-21 07:59:48.071578] I [MSGID: 106474]
>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>     host found Hostname is cobalt
>>
>>     from etc-glusterfs-glusterd.vol.log on the other node:
>>
>>     [2015-09-21 08:12:50.111877] E [MSGID: 106062]
>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>     to acquire volname
>>     [2015-09-21 08:14:50.548087] E [MSGID: 106062]
>>     [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable
>>     to acquire volname
>>     [2015-09-21 08:14:50.654746] I [MSGID: 106132]
>>     [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>     already stopped
>>     [2015-09-21 08:14:50.655095] I [MSGID: 106474]
>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>     host found Hostname is cobalt
>>     [2015-09-21 08:14:51.287156] E [MSGID: 106062]
>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>     to acquire volname
>>
>>
>>     from etc-glusterfs-glusterd.vol.log on the arbiter node:
>>
>>     [2015-09-21 08:18:50.934713] E [MSGID: 101075]
>>     [common-utils.c:3127:gf_is_local_addr] 0-management: error in
>>     getaddrinfo: Name or service not known
>>     [2015-09-21 08:18:51.504694] E [MSGID: 106062]
>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable
>>     to acquire volname
>>
>>     I have put the hostnames of all servers in my /etc/hosts file,
>>     including the arbiter node.
>>
>>
>>     On 18 September 2015 at 16:52, Soumya Koduri <skoduri at
redhat.com
>>     <mailto:skoduri at redhat.com>> wrote:
>>
>>         Hi Tiemen,
>>
>>         One of the pre-requisites before setting up nfs-ganesha HA is
>>         to create and mount shared_storage volume. Use below CLI for
that
>>
>>         "gluster volume set all cluster.enable-shared-storage
enable"
>>
>>         It shall create the volume and mount in all the nodes
>>         (including the arbiter node). Note this volume shall be
>>         mounted on all the nodes of the gluster storage pool (though
>>         in this case it may not be part of nfs-ganesha cluster).
>>
>>         So instead of manually creating those directory paths, please
>>         use above CLI and try re-configuring the setup.
>>
>>         Thanks,
>>         Soumya
>>
>>         On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:
>>
>>             Hello Kaleb,
>>
>>             I don't:
>>
>>             # Name of the HA cluster created.
>>             # must be unique within the subnet
>>             HA_NAME="rd-ganesha-ha"
>>             #
>>             # The gluster server from which to mount the shared data
>>             volume.
>>             HA_VOL_SERVER="iron"
>>             #
>>             # N.B. you may use short names or long names; you may not
>>             use IP addrs.
>>             # Once you select one, stay with it as it will be mildly
>>             unpleasant to
>>             # clean up if you switch later on. Ensure that all names -
>>             short and/or
>>             # long - are in DNS or /etc/hosts on all machines in the
>>             cluster.
>>             #
>>             # The subset of nodes of the Gluster Trusted Pool that
>>             form the ganesha
>>             # HA cluster. Hostname is specified.
>>             HA_CLUSTER_NODES="cobalt,iron"
>>             #HA_CLUSTER_NODES="server1.lab.redhat.com
>>             <http://server1.lab.redhat.com>
>>            
<http://server1.lab.redhat.com>,server2.lab.redhat.com
>>             <http://server2.lab.redhat.com>
>>             <http://server2.lab.redhat.com>,..."
>>             #
>>             # Virtual IPs for each of the nodes specified above.
>>             VIP_server1="10.100.30.101"
>>             VIP_server2="10.100.30.102"
>>             #VIP_server1_lab_redhat_com="10.0.2.1"
>>             #VIP_server2_lab_redhat_com="10.0.2.2"
>>
>>             hosts cobalt & iron are the data nodes, the arbiter
>>             ip/hostname (neon)
>>             isn't mentioned anywhere in this config file.
>>
>>
>>             On 18 September 2015 at 15:56, Kaleb S. KEITHLEY
>>             <<mailto:kkeithle at redhat.com>kkeithle at
redhat.com
>>             <mailto:kkeithle at redhat.com>
>>             <mailto:kkeithle at redhat.com <mailto:kkeithle at
redhat.com>>>
>>             wrote:
>>
>>             ?  ?  On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:
>>             ?  ?  > Hello,
>>             ?  ?  >
>>             ?  ?  > I have a Gluster cluster with a single replica
3,
>>             arbiter 1 volume (so
>>             ?  ?  > two nodes with actual data, one arbiter node). I
>>             would like to setup
>>             ?  ?  > NFS-Ganesha HA for this volume but I'm
having some
>>             difficulties.
>>             ?  ?  >
>>             ?  ?  > - I needed to create a directory
>>             /var/run/gluster/shared_storage
>>             ?  ?  > manually on all nodes, or the command
'gluster
>>             nfs-ganesha enable would
>>             ?  ?  > fail with the following error:
>>             ?  ?  > [2015-09-18 13:13:34.690416] E [MSGID: 106032]
>>             ?  ?  > [glusterd-ganesha.c:708:pre_setup]
0-THIS->name:
>>             mkdir() failed on path
>>             ?  ?  > /var/run/gluster/shared_storage/nfs-ganesha, [No
>>             such file or directory]
>>             ?  ?  >
>>             ?  ?  > - Then I found out that the command connects to
>>             the arbiter node as
>>             ?  ?  > well, but obviously I don't want to set up
>>             NFS-Ganesha there. Is it
>>             ?  ?  > actually possible to setup NFS-Ganesha HA with
an
>>             arbiter node? If it's
>>             ?  ?  > possible, is there any documentation on how to
do
>>             that?
>>             ?  ?  >
>>
>>             ?  ?  Please send the /etc/ganesha/ganesha-ha.conf file
>>             you're using.
>>
>>             ?  ?  Probably you have included the arbiter in your HA
>>             config; that would be
>>             ?  ?  a mistake.
>>
>>             ?  ?  --
>>
>>             ?  ?  Kaleb
>>
>>
>>
>>
>>             --
>>             Tiemen Ruiten
>>             Systems Engineer
>>             R&D Media
>>
>>
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>     --
>>     Tiemen Ruiten
>>     Systems Engineer
>>     R&D Media
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>     http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Tiemen Ruiten
> Systems Engineer
> R&D Media
>
>
>
> --
> Tiemen Ruiten
> Systems Engineer
> R&D Media
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Tiemen Ruiten

2015-Sep-22 09:05 UTC

head link

[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume

I missed having passwordless SSH auth for the root user. However it did not
make a difference:

After verifying prerequisites, issued gluster nfs-ganesha enable on node
cobalt:

Sep 22 10:19:56 cobalt systemd: Starting Preprocess NFS configuration...
Sep 22 10:19:56 cobalt systemd: Starting RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Reached target RPC Port Mapper.
Sep 22 10:19:56 cobalt systemd: Starting Host and Network Name Lookups.
Sep 22 10:19:56 cobalt systemd: Reached target Host and Network Name
Lookups.
Sep 22 10:19:56 cobalt systemd: Starting RPC bind service...
Sep 22 10:19:56 cobalt systemd: Started Preprocess NFS configuration.
Sep 22 10:19:56 cobalt systemd: Started RPC bind service.
Sep 22 10:19:56 cobalt systemd: Starting NFS status monitor for NFSv2/3
locking....
Sep 22 10:19:56 cobalt rpc.statd[2666]: Version 1.3.0 starting
Sep 22 10:19:56 cobalt rpc.statd[2666]: Flags: TI-RPC
Sep 22 10:19:56 cobalt systemd: Started NFS status monitor for NFSv2/3
locking..
Sep 22 10:19:56 cobalt systemd: Starting NFS-Ganesha file server...
Sep 22 10:19:56 cobalt systemd: Started NFS-Ganesha file server.
Sep 22 10:19:56 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
capabilities (legacy support in use)
Sep 22 10:19:56 cobalt rpc.statd[2666]: Received SM_UNMON_ALL request from
cobalt.int.rdmedia.com while not monitoring any hosts
Sep 22 10:19:56 cobalt logger: setting up rd-ganesha-ha
Sep 22 10:19:56 cobalt logger: setting up cluster rd-ganesha-ha with the
following cobalt iron
Sep 22 10:19:57 cobalt systemd: Stopped Pacemaker High Availability Cluster
Manager.
Sep 22 10:19:57 cobalt systemd: Stopped Corosync Cluster Engine.
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Reloading.
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
'RemoveOnStop'
in section 'Socket'
Sep 22 10:19:57 cobalt systemd:
[/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
'RemoveOnStop' in section 'Socket'
Sep 22 10:19:57 cobalt systemd: Starting Corosync Cluster Engine...
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync Cluster Engine
('2.3.4'): started and ready to provide service.
Sep 22 10:19:57 cobalt corosync[2815]: [MAIN  ] Corosync built-in features:
dbus systemd xmlconf snmp pie relro bindnow
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Sep 22 10:19:57 cobalt corosync[2816]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: none hash: none
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] The network interface
[10.100.30.37] is now up.
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration map access [0]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cmap
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync configuration service [1]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cfg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: cpg
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync profile loading service [4]
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Using quorum provider
corosync_votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: votequorum
Sep 22 10:19:58 cobalt corosync[2816]: [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Sep 22 10:19:58 cobalt corosync[2816]: [QB    ] server name: quorum
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.37}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] adding new UDPU member
{10.100.30.38}
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:140) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [TOTEM ] A new membership (
10.100.30.37:148) was formed. Members joined: 1
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [VOTEQ ] Waiting for all cluster
members. Current votes: 1 expected_votes: 2
Sep 22 10:19:58 cobalt corosync[2816]: [QUORUM] Members[0]:
Sep 22 10:19:58 cobalt corosync[2816]: [MAIN  ] Completed service
synchronization, ready to provide service.
*Sep 22 10:21:27 cobalt systemd: corosync.service operation timed out.
Terminating.*
*Sep 22 10:21:27 cobalt corosync: Starting Corosync Cluster Engine
(corosync):*
*Sep 22 10:21:27 cobalt systemd: Failed to start Corosync Cluster Engine.*
*Sep 22 10:21:27 cobalt systemd: Unit corosync.service entered failed
state.*
Sep 22 10:21:32 cobalt logger: warning: pcs property set
no-quorum-policy=ignore failed
Sep 22 10:21:32 cobalt logger: warning: pcs property set
stonith-enabled=false failed
Sep 22 10:21:32 cobalt logger: warning: pcs resource create nfs_start
ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource delete nfs_start-clone
failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-mon
ganesha_mon --clone failed
Sep 22 10:21:33 cobalt logger: warning: pcs resource create nfs-grace
ganesha_grace --clone failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.101 cidr_netmask=32
op monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
cobalt-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
nfs-grace-clone then cobalt-cluster_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning pcs resource create
iron-cluster_ip-1 ocf:heartbeat:IPaddr ip=10.100.30.102 cidr_netmask=32 op
monitor interval=15s failed
Sep 22 10:21:34 cobalt logger: warning: pcs resource create
iron-trigger_ip-1 ocf:heartbeat:Dummy failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint colocation add
iron-cluster_ip-1 with iron-trigger_ip-1 failed
Sep 22 10:21:34 cobalt logger: warning: pcs constraint order
iron-trigger_ip-1 then nfs-grace-clone failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint order
nfs-grace-clone then iron-cluster_ip-1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers iron=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
cobalt-cluster_ip-1 prefers cobalt=2000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers cobalt=1000 failed
Sep 22 10:21:35 cobalt logger: warning: pcs constraint location
iron-cluster_ip-1 prefers iron=2000 failed
Sep 22 10:21:35 cobalt logger: warning pcs cluster cib-push
/tmp/tmp.yqLT4m75WG failed

Notice the failed corosync service in bold. I can't find any logs pointing
to a reason. Starting it manually is not a problem:

Sep 22 10:35:06 cobalt corosync: Starting Corosync Cluster Engine
(corosync): [  OK  ]

Then I noticed pacemaker was not running on both nodes. Started it manually
and saw the following in /var/log/messages on the other node:

Sep 22 10:36:43 iron cibadmin[4654]: notice: Invoked: /usr/sbin/cibadmin
--replace -o configuration -V --xml-pipe
Sep 22 10:36:43 iron crmd[4617]: notice: State transition S_IDLE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Sep 22 10:36:44 iron pengine[4616]: notice: On loss of CCM Quorum: Ignore
Sep 22 10:36:44 iron pengine[4616]: error: Resource start-up disabled since
no STONITH resources have been defined
Sep 22 10:36:44 iron pengine[4616]: error: Either configure some or disable
STONITH with the stonith-enabled option
Sep 22 10:36:44 iron pengine[4616]: error: NOTE: Clusters with shared data
need STONITH to ensure data integrity
Sep 22 10:36:44 iron pengine[4616]: notice: Delaying fencing operations
until there are resources to manage
Sep 22 10:36:44 iron pengine[4616]: warning: Node iron is unclean!
Sep 22 10:36:44 iron pengine[4616]: notice: Cannot fence unclean nodes
until quorum is attained (or no-quorum-policy is set to ignore)
Sep 22 10:36:44 iron pengine[4616]: warning: Calculated Transition 2:
/var/lib/pacemaker/pengine/pe-warn-20.bz2
Sep 22 10:36:44 iron pengine[4616]: notice: Configuration ERRORs found
during PE processing.  Please run "crm_verify -L" to identify issues.
Sep 22 10:36:44 iron crmd[4617]: notice: Transition 2 (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-20.bz2): Complete
Sep 22 10:36:44 iron crmd[4617]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

I'm starting to think there is some leftover config somewhere from all
these attempts. Is there a way to completely reset all config related to
NFS-Ganesha and start over?



On 22 September 2015 at 09:04, Soumya Koduri <skoduri at redhat.com>
wrote:
> Hi Tiemen,
>
> Have added the steps to configure HA NFS in the below doc. Please verify
> if you have all the pre-requisites done & steps performed right.
>
>
>
https://github.com/soumyakoduri/glusterdocs/blob/ha_guide/Administrator%20Guide/Configuring%20HA%20NFS%20Server.md
>
> Thanks,
> Soumya
>
> On 09/21/2015 09:21 PM, Tiemen Ruiten wrote:
>
>> Whoops, replied off-list.
>>
>> Additionally I noticed that the generated corosync config is not valid,
>> as there is no interface section:
>>
>> /etc/corosync/corosync.conf
>>
>> totem {
>> version: 2
>> secauth: off
>> cluster_name: rd-ganesha-ha
>> transport: udpu
>> }
>>
>> nodelist {
>> ?  node {
>> ?  ?  ?  ?  ring0_addr: cobalt
>> ?  ?  ?  ?  nodeid: 1
>> ?  ?  ?  ? }
>> ?  node {
>> ?  ?  ?  ?  ring0_addr: iron
>> ?  ?  ?  ?  nodeid: 2
>> ?  ?  ?  ? }
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> }
>>
>> logging {
>> to_syslog: yes
>> }
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: *Tiemen Ruiten* <t.ruiten at rdmedia.com <mailto:t.ruiten
at rdmedia.com
>> >>
>> Date: 21 September 2015 at 17:16
>> Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume
>> To: Jiffin Tony Thottan <jthottan at redhat.com <mailto:jthottan
at redhat.com
>> >>
>>
>>
>> Could you point me to the latest documentation? I've been
struggling to
>> find something up-to-date. I believe I have all the prerequisites:
>>
>> - shared storage volume exists and is mounted
>> - all nodes in hosts files
>> - Gluster-NFS disabled
>> - corosync, pacemaker and nfs-ganesha rpm's installed
>>
>> Anything I missed?
>>
>> Everything has been installed by RPM so is in the default locations:
>> /usr/libexec/ganesha/ganesha-ha.sh
>> /etc/ganesha/ganesha.conf (empty)
>> /etc/ganesha/ganesha-ha.conf
>>
>> After I started the pcsd service manually, nfs-ganesha could be enabled
>> successfully, but there was no virtual IP present on the interfaces and
>> looking at the system log, I noticed corosync failed to start:
>>
>> - on the host where I issued the gluster nfs-ganesha enable command:
>>
>> Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request
from
>> iron.int.rdmedia.com <http://iron.int.rdmedia.com> while not
monitoring
>> any hosts
>> Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync Cluster Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 iron corosync[3426]: [MAIN ? ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface
>> [10.100.30.38] is now up.
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cmap
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cfg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: cpg
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: votequorum
>> Sep 21 17:07:20 iron corosync[3427]: [SERV ? ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:20 iron corosync[3427]: [QB ?  ? ] server name: quorum
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.38:104 <http://10.100.30.38:104>) was formed. Members
joined:
>> 1
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:20 iron corosync[3427]: [MAIN ? ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members
joined:
>> 1
>>
>> Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine
>> (corosync): [FAILED]
>> Sep 21 17:08:21 iron systemd: corosync.service: control process exited,
>> code=exited status=1
>> Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine.
>> Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed
state.
>>
>>
>> - on the other host:
>>
>> Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS
configuration...
>> Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper.
>> Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups.
>> Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name
>> Lookups.
>> Sep 21 17:07:19 cobalt systemd: Starting RPC bind service...
>> Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration.
>> Sep 21 17:07:19 cobalt systemd: Started RPC bind service.
>> Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3
>> locking....
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC
>> Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3
>> locking..
>> Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server...
>> Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server.
>> Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit
>> capabilities (legacy support in use)
>> Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha
>> Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request
>> from cobalt.int.rdmedia.com <http://cobalt.int.rdmedia.com> while
not
>> monitoring any hosts
>> Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with
the
>> following cobalt iron
>> Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability
>> Cluster Manager.
>> Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine.
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Reloading.
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd:
>> [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue
>> 'RemoveOnStop' in section 'Socket'
>> Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine...
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync Cluster
Engine
>> ('2.3.4'): started and ready to provide service.
>> Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ? ] Corosync built-in
>> features: dbus systemd xmlconf snmp pie relro bindnow
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport
>> (UDP/IP Unicast).
>> Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: none hash: none
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface
>> [10.100.30.37] is now up.
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync configuration map access [0]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cmap
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync configuration service [1]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cfg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync cluster closed process group service v1.01 [2]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: cpg
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync profile loading service [4]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider
>> corosync_votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync vote quorum service v1.0 [5]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name:
votequorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [SERV ? ] Service engine loaded:
>> corosync cluster quorum service v0.1 [3]
>> Sep 21 17:07:21 cobalt corosync[2817]: [QB ?  ? ] server name: quorum
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.37}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member
>> {10.100.30.38}
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:100 <http://10.100.30.37:100>) was formed. Members
joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service
>> synchronization, ready to provide service.
>> Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership
>> (10.100.30.37:108 <http://10.100.30.37:108>) was formed. Members
joined:
>> 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster
>> members. Current votes: 1 expected_votes: 2
>> Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1
>> Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ? ] Completed service
>>
>> synchronization, ready to provide service.
>> Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out.
>> Terminating.
>> Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine
>> (corosync):
>> Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster
Engine.
>> Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed
>> state.
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> no-quorum-policy=ignore failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs property set
>> stonith-enabled=false failed
>> Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start
>> ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource delete
>> nfs_start-clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon
>> ganesha_mon --clone failed
>> Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace
>> ganesha_grace --clone failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> cobalt-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then cobalt-cluster_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning pcs resource create
>> iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor
>> interval=15s failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs resource create
>> iron-trigger_ip-1 ocf:heartbeat:Dummy failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add
>> iron-cluster_ip-1 with iron-trigger_ip-1 failed
>> Sep 21 17:08:57 cobalt logger: warning: pcs constraint order
>> iron-trigger_ip-1 then nfs-grace-clone failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint order
>> nfs-grace-clone then iron-cluster_ip-1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers iron=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> cobalt-cluster_ip-1 prefers cobalt=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers cobalt=1000 failed
>> Sep 21 17:08:58 cobalt logger: warning: pcs constraint location
>> iron-cluster_ip-1 prefers iron=2000 failed
>> Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push
>> /tmp/tmp.nXTfyA1GMR failed
>> Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt
>> failed
>>
>> BTW, I'm using CentOS 7. There are multiple network interfaces on
the
>> servers, could that be a problem??
>>
>>
>>
>>
>> On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthottan at
redhat.com
>> <mailto:jthottan at redhat.com>> wrote:
>>
>>
>>
>>     On 21/09/15 13:56, Tiemen Ruiten wrote:
>>
>>>     Hello Soumya, Kaleb, list,
>>>
>>>     This Friday I created the gluster_shared_storage volume
manually,
>>>     I just tried it with the command you supplied, but both have
the
>>>     same result:
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the node where I issued
the
>>>     command:
>>>
>>>     [2015-09-21 07:59:47.756845] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 07:59:48.071755] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 07:59:48.653879] E [MSGID: 106470]
>>>     [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management:
>>>     Initial NFS-Ganesha set up failed
>>>
>>
>>     As far as what I understand from the logs, it called
>>     setup_cluser()[calls `ganesha-ha.sh` script ] but script failed.
>>     Can u please provide following details :
>>     -Location of ganesha.sh file??
>>     -Location of ganesha-ha.conf, ganesha.conf files ?
>>
>>
>>     And also can u cross check whether all the prerequisites before HA
>>     setup satisfied ?
>>
>>     --
>>     With Regards,
>>     Jiffin
>>
>>
>>     [2015-09-21 07:59:48.653912] E [MSGID: 106123]
>>>     [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management:
Commit
>>>     of operation 'Volume (null)' failed on localhost :
Failed to set
>>>     up HA config for NFS-Ganesha. Please check the log file for
details
>>>     [2015-09-21 07:59:45.402458] I [MSGID: 106006]
>>>     [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify]
>>>     0-management: nfs has disconnected from glusterd.
>>>     [2015-09-21 07:59:48.071578] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the other node:
>>>
>>>     [2015-09-21 08:12:50.111877] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management:
Unable
>>>     to acquire volname
>>>     [2015-09-21 08:14:50.548087] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management:
Unable
>>>     to acquire volname
>>>     [2015-09-21 08:14:50.654746] I [MSGID: 106132]
>>>     [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>     already stopped
>>>     [2015-09-21 08:14:50.655095] I [MSGID: 106474]
>>>     [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha
>>>     host found Hostname is cobalt
>>>     [2015-09-21 08:14:51.287156] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management:
Unable
>>>     to acquire volname
>>>
>>>
>>>     from etc-glusterfs-glusterd.vol.log on the arbiter node:
>>>
>>>     [2015-09-21 08:18:50.934713] E [MSGID: 101075]
>>>     [common-utils.c:3127:gf_is_local_addr] 0-management: error in
>>>     getaddrinfo: Name or service not known
>>>     [2015-09-21 08:18:51.504694] E [MSGID: 106062]
>>>     [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management:
Unable
>>>     to acquire volname
>>>
>>>     I have put the hostnames of all servers in my /etc/hosts file,
>>>     including the arbiter node.
>>>
>>>
>>>     On 18 September 2015 at 16:52, Soumya Koduri <skoduri at
redhat.com
>>>     <mailto:skoduri at redhat.com>> wrote:
>>>
>>>         Hi Tiemen,
>>>
>>>         One of the pre-requisites before setting up nfs-ganesha HA
is
>>>         to create and mount shared_storage volume. Use below CLI
for that
>>>
>>>         "gluster volume set all cluster.enable-shared-storage
enable"
>>>
>>>         It shall create the volume and mount in all the nodes
>>>         (including the arbiter node). Note this volume shall be
>>>         mounted on all the nodes of the gluster storage pool
(though
>>>         in this case it may not be part of nfs-ganesha cluster).
>>>
>>>         So instead of manually creating those directory paths,
please
>>>         use above CLI and try re-configuring the setup.
>>>
>>>         Thanks,
>>>         Soumya
>>>
>>>         On 09/18/2015 07:29 PM, Tiemen Ruiten wrote:
>>>
>>>             Hello Kaleb,
>>>
>>>             I don't:
>>>
>>>             # Name of the HA cluster created.
>>>             # must be unique within the subnet
>>>             HA_NAME="rd-ganesha-ha"
>>>             #
>>>             # The gluster server from which to mount the shared
data
>>>             volume.
>>>             HA_VOL_SERVER="iron"
>>>             #
>>>             # N.B. you may use short names or long names; you may
not
>>>             use IP addrs.
>>>             # Once you select one, stay with it as it will be
mildly
>>>             unpleasant to
>>>             # clean up if you switch later on. Ensure that all
names -
>>>             short and/or
>>>             # long - are in DNS or /etc/hosts on all machines in
the
>>>             cluster.
>>>             #
>>>             # The subset of nodes of the Gluster Trusted Pool that
>>>             form the ganesha
>>>             # HA cluster. Hostname is specified.
>>>             HA_CLUSTER_NODES="cobalt,iron"
>>>             #HA_CLUSTER_NODES="server1.lab.redhat.com
>>>             <http://server1.lab.redhat.com>
>>>            
<http://server1.lab.redhat.com>,server2.lab.redhat.com
>>>             <http://server2.lab.redhat.com>
>>>             <http://server2.lab.redhat.com>,..."
>>>             #
>>>             # Virtual IPs for each of the nodes specified above.
>>>             VIP_server1="10.100.30.101"
>>>             VIP_server2="10.100.30.102"
>>>             #VIP_server1_lab_redhat_com="10.0.2.1"
>>>             #VIP_server2_lab_redhat_com="10.0.2.2"
>>>
>>>             hosts cobalt & iron are the data nodes, the arbiter
>>>             ip/hostname (neon)
>>>             isn't mentioned anywhere in this config file.
>>>
>>>
>>>             On 18 September 2015 at 15:56, Kaleb S. KEITHLEY
>>>             <<mailto:kkeithle at redhat.com>kkeithle at
redhat.com
>>>             <mailto:kkeithle at redhat.com>
>>>             <mailto:kkeithle at redhat.com <mailto:kkeithle
at redhat.com>>>
>>>             wrote:
>>>
>>>             ?  ?  On 09/18/2015 09:46 AM, Tiemen Ruiten wrote:
>>>             ?  ?  > Hello,
>>>             ?  ?  >
>>>             ?  ?  > I have a Gluster cluster with a single
replica 3,
>>>             arbiter 1 volume (so
>>>             ?  ?  > two nodes with actual data, one arbiter
node). I
>>>             would like to setup
>>>             ?  ?  > NFS-Ganesha HA for this volume but I'm
having some
>>>             difficulties.
>>>             ?  ?  >
>>>             ?  ?  > - I needed to create a directory
>>>             /var/run/gluster/shared_storage
>>>             ?  ?  > manually on all nodes, or the command
'gluster
>>>             nfs-ganesha enable would
>>>             ?  ?  > fail with the following error:
>>>             ?  ?  > [2015-09-18 13:13:34.690416] E [MSGID:
106032]
>>>             ?  ?  > [glusterd-ganesha.c:708:pre_setup]
0-THIS->name:
>>>             mkdir() failed on path
>>>             ?  ?  > /var/run/gluster/shared_storage/nfs-ganesha,
[No
>>>             such file or directory]
>>>             ?  ?  >
>>>             ?  ?  > - Then I found out that the command connects
to
>>>             the arbiter node as
>>>             ?  ?  > well, but obviously I don't want to set
up
>>>             NFS-Ganesha there. Is it
>>>             ?  ?  > actually possible to setup NFS-Ganesha HA
with an
>>>             arbiter node? If it's
>>>             ?  ?  > possible, is there any documentation on how
to do
>>>             that?
>>>             ?  ?  >
>>>
>>>             ?  ?  Please send the /etc/ganesha/ganesha-ha.conf file
>>>             you're using.
>>>
>>>             ?  ?  Probably you have included the arbiter in your HA
>>>             config; that would be
>>>             ?  ?  a mistake.
>>>
>>>             ?  ?  --
>>>
>>>             ?  ?  Kaleb
>>>
>>>
>>>
>>>
>>>             --
>>>             Tiemen Ruiten
>>>             Systems Engineer
>>>             R&D Media
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>>     --
>>>     Tiemen Ruiten
>>>     Systems Engineer
>>>     R&D Media
>>>
>>>
>>>     _______________________________________________
>>>     Gluster-users mailing list
>>>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>

-- 
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150922/353c786a/attachment.html>

Gluster users - Sep 2015 - Fwd: nfs-ganesha HA with arbiter volume

[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume

[Gluster-users] Fwd: nfs-ganesha HA with arbiter volume