Adam Ru
2017-May-05 14:34 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
Hi Soumya, Thank you for the answer. Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank you. I spent some time by testing and I have some results. This is what I did: - Clean installation of CentOS 7.3 with all updates, 3x node, resolvable IPs and VIPs - Stopped firewalld (just for testing) - Install "centos-release-gluster" to get "centos-gluster310" repo and install following (nothing else): --- glusterfs-server --- glusterfs-ganesha - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes) - systemctl enable and start glusterd - gluster peer probe <other nodes> - gluster volume set all cluster.enable-shared-storage enable - systemctl enable and start pcsd.service - systemctl enable pacemaker.service (cannot be started at this moment) - Set password for hacluster user on all nodes - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not sure if needed) - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and insert configuration - Try list files on other nodes: ls /var/run/gluster/shared_storage/nfs-ganesha/ - gluster nfs-ganesha enable - Check on other nodes that nfs-ganesha.service is running and "pcs status" shows started resources - gluster volume create mynewshare replica 3 transport tcp node1:/<dir> node2:/<dir> node3:/<dir> - gluster volume start mynewshare - gluster vol set mynewshare ganesha.enable on After these steps, all VIPs are pingable and I can mount node1:/mynewshare Funny thing is that pacemaker.service is disabled again (something disabled it). This is status of important (I think) services: systemctl list-units --all # corosync.service loaded active running # glusterd.service loaded active running # nfs-config.service loaded inactive dead # nfs-ganesha-config.service loaded inactive dead # nfs-ganesha-lock.service loaded active running # nfs-ganesha.service loaded active running # nfs-idmapd.service loaded inactive dead # nfs-mountd.service loaded inactive dead # nfs-server.service loaded inactive dead # nfs-utils.service loaded inactive dead # pacemaker.service loaded active running # pcsd.service loaded active running systemctl list-unit-files --all # corosync-notifyd.service disabled # corosync.service disabled # glusterd.service enabled # glusterfsd.service disabled # nfs-blkmap.service disabled # nfs-config.service static # nfs-ganesha-config.service static # nfs-ganesha-lock.service static # nfs-ganesha.service disabled # nfs-idmap.service static # nfs-idmapd.service static # nfs-lock.service static # nfs-mountd.service static # nfs-rquotad.service disabled # nfs-secure-server.service static # nfs-secure.service static # nfs-server.service disabled # nfs-utils.service static # nfs.service disabled # nfslock.service static # pacemaker.service disabled # pcsd.service enabled I enabled pacemaker again on all nodes and restart all nodes one by one. After reboot all VIPs are gone and I can see that nfs-ganesha.service isn?t running. When I start it on at least two nodes then VIPs are pingable again and I can mount NFS again. But there is still some issue in the setup because when I check nfs-ganesha-lock.service I get: systemctl -l status nfs-ganesha-lock.service ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC; 31min ago Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=1/FAILURE) May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open directory sm: Permission denied May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open /var/lib/nfs/statd/state: Permission denied May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service: control process exited, code=exited status=1 May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking.. May 05 13:43:37 node0.localdomain systemd[1]: Unit nfs-ganesha-lock.service entered failed state. May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service failed. Thank you, Kind regards, Adam On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hi, > > > Same here, when i reboot the node i have to manually execute "pcs cluster > start gluster01" and pcsd already enabled and started. > > Gluster 3.8.11 > > Centos 7.3 latest > > Installed using CentOS Storage SIG repository > > > > -- > > Respectfully > *Mahdi A. Mahdi* > > ------------------------------ > *From:* gluster-users-bounces at gluster.org <gluster-users-bounces@ > gluster.org> on behalf of Adam Ru <ad.ruckel at gmail.com> > *Sent:* Wednesday, May 3, 2017 12:09:58 PM > *To:* Soumya Koduri > *Cc:* gluster-users at gluster.org > *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is down > after reboot > > Hi Soumya, > > thank you very much for your reply. > > I enabled pcsd during setup and after reboot during troubleshooting I > manually started it and checked resources (pcs status). They were not > running. I didn?t find what was wrong but I?m going to try it again. > > I?ve thoroughly checked > http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha% > 20GlusterFS%20Integration/ > and I can confirm that I followed all steps with one exception. I > installed following RPMs: > glusterfs-server > glusterfs-fuse > glusterfs-cli > glusterfs-ganesha > nfs-ganesha-xfs > > and the guide referenced above specifies: > glusterfs-server > glusterfs-api > glusterfs-ganesha > > glusterfs-api is a dependency of one of RPMs that I installed so this is > not a problem. But I cannot find any mention to install nfs-ganesha-xfs. > > I?ll try to setup the whole environment again without installing > nfs-ganesha-xfs (I assume glusterfs-ganesha has all required binaries). > > Again, thank you for you time to answer my previous message. > > Kind regards, > Adam > > On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com> wrote: > >> Hi, >> >> On 05/02/2017 01:34 AM, Rudolf wrote: >> >>> Hi Gluster users, >>> >>> First, I'd like to thank you all for this amazing open-source! Thank you! >>> >>> I'm working on home project ? three servers with Gluster and >>> NFS-Ganesha. My goal is to create HA NFS share with three copies of each >>> file on each server. >>> >>> My systems are CentOS 7.3 Minimal install with the latest updates and >>> the most current RPMs from "centos-gluster310" repository. >>> >>> I followed this tutorial: >>> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using- >>> nfs-ganesha-and-glusterfs-one-step-at-a-time/ >>> (second half that describes multi-node HA setup) >>> >>> with a few exceptions: >>> >>> 1. All RPMs are from "centos-gluster310" repo that is installed by "yum >>> -y install centos-release-gluster" >>> 2. I have three nodes (not four) with "replica 3" volume. >>> 3. I created empty ganesha.conf and not empty ganesha-ha.conf in >>> "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced blog post is >>> outdated, this is now requirement) >>> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this isn't needed >>> anymore. >>> >>> >> Please refer to http://gluster.readthedocs.io/ >> en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >> >> It is being updated with latest changes happened wrt setup. >> >> When I finish configuration, all is good. nfs-ganesha.service is active >>> and running and from client I can ping all three VIPs and I can mount >>> NFS. Copied files are replicated to all nodes. >>> >>> But when I restart nodes (one by one, with 5 min. delay between) then I >>> cannot ping or mount (I assume that all VIPs are down). So my setup >>> definitely isn't HA. >>> >>> I found that: >>> # pcs status >>> Error: cluster is not currently running on this node >>> >> >> This means pcsd service is not up. Did you enable (systemctl enable pcsd) >> pcsd service so that is comes up post reboot automatically. If not please >> start it manually. >> >> >>> and nfs-ganesha.service is in inactive state. Btw. I didn't enable >>> "systemctl enable nfs-ganesha" since I assume that this is something >>> that Gluster does. >>> >> >> Please check /var/log/ganesha.log for any errors/warnings. >> >> We recommend not to enable nfs-ganesha.service (by default), as the >> shared storage (where the ganesha.conf file resides now) should be up and >> running before nfs-ganesha gets started. >> So if enabled by default it could happen that shared_storage mount point >> is not yet up and it resulted in nfs-ganesha service failure. If you would >> like to address this, you could have a cron job which keeps checking the >> mount point health and then start nfs-ganesha service. >> >> Thanks, >> Soumya >> >> >>> I assume that my issue is that I followed instructions in blog post from >>> 2015/10 that are outdated. Unfortunately I cannot find anything better ? >>> I spent whole day by googling. >>> >>> Would you be so kind and check the instructions in blog post and let me >>> know what steps are wrong / outdated? Or please do you have more current >>> instructions for Gluster+Ganesha setup? >>> >>> Thank you. >>> >>> Kind regards, >>> Adam >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> > > > -- > Adam >-- Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170505/72d69c2a/attachment.html>
Soumya Koduri
2017-May-05 19:10 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
On 05/05/2017 08:04 PM, Adam Ru wrote:> Hi Soumya, > > Thank you for the answer. > > Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank you. > > I spent some time by testing and I have some results. This is what I did: > > - Clean installation of CentOS 7.3 with all updates, 3x node, > resolvable IPs and VIPs > - Stopped firewalld (just for testing) > - Install "centos-release-gluster" to get "centos-gluster310" repo and > install following (nothing else): > --- glusterfs-server > --- glusterfs-ganesha > - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem > and secret.pem.pub on all nodes) > - systemctl enable and start glusterd > - gluster peer probe <other nodes> > - gluster volume set all cluster.enable-shared-storage enable > - systemctl enable and start pcsd.service > - systemctl enable pacemaker.service (cannot be started at this moment) > - Set password for hacluster user on all nodes > - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla > - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ > - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not > sure if needed) > - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and > insert configuration > - Try list files on other nodes: ls > /var/run/gluster/shared_storage/nfs-ganesha/ > - gluster nfs-ganesha enable > - Check on other nodes that nfs-ganesha.service is running and "pcs > status" shows started resources > - gluster volume create mynewshare replica 3 transport tcp node1:/<dir> > node2:/<dir> node3:/<dir> > - gluster volume start mynewshare > - gluster vol set mynewshare ganesha.enable on > > After these steps, all VIPs are pingable and I can mount node1:/mynewshare > > Funny thing is that pacemaker.service is disabled again (something > disabled it). This is status of important (I think) services:yeah. We too had observed this recently. We guess probably pcs cluster setup command first destroys existing cluster (if any) which may be disabling pacemaker too.> > systemctl list-units --all > # corosync.service loaded active running > # glusterd.service loaded active running > # nfs-config.service loaded inactive dead > # nfs-ganesha-config.service loaded inactive dead > # nfs-ganesha-lock.service loaded active running > # nfs-ganesha.service loaded active running > # nfs-idmapd.service loaded inactive dead > # nfs-mountd.service loaded inactive dead > # nfs-server.service loaded inactive dead > # nfs-utils.service loaded inactive dead > # pacemaker.service loaded active running > # pcsd.service loaded active running > > systemctl list-unit-files --all > # corosync-notifyd.service disabled > # corosync.service disabled > # glusterd.service enabled > # glusterfsd.service disabled > # nfs-blkmap.service disabled > # nfs-config.service static > # nfs-ganesha-config.service static > # nfs-ganesha-lock.service static > # nfs-ganesha.service disabled > # nfs-idmap.service static > # nfs-idmapd.service static > # nfs-lock.service static > # nfs-mountd.service static > # nfs-rquotad.service disabled > # nfs-secure-server.service static > # nfs-secure.service static > # nfs-server.service disabled > # nfs-utils.service static > # nfs.service disabled > # nfslock.service static > # pacemaker.service disabled > # pcsd.service enabled > > I enabled pacemaker again on all nodes and restart all nodes one by one. > > After reboot all VIPs are gone and I can see that nfs-ganesha.service > isn?t running. When I start it on at least two nodes then VIPs are > pingable again and I can mount NFS again. But there is still some issue > in the setup because when I check nfs-ganesha-lock.service I get: > > systemctl -l status nfs-ganesha-lock.service > ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. > Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; > static; vendor preset: disabled) > Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC; > 31min ago > Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS > (code=exited, status=1/FAILURE) > > May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status > monitor for NFSv2/3 locking.... > May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting > May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC > May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open > directory sm: Permission deniedOkay this issue was fixed and the fix should be present in 3.10 too - https://review.gluster.org/#/c/16433/ Please check '/var/log/messages' for statd related errors and cross-check permissions of that directory. You could manually chown owner:group of /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha* services. Thanks, Soumya> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open > /var/lib/nfs/statd/state: Permission denied > May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service: > control process exited, code=exited status=1 > May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status > monitor for NFSv2/3 locking.. > May 05 13:43:37 node0.localdomain systemd[1]: Unit > nfs-ganesha-lock.service entered failed state. > May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service > failed. > > Thank you, > > Kind regards, > > Adam > > On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com > <mailto:mahdi.adnan at outlook.com>> wrote: > > Hi, > > > Same here, when i reboot the node i have to manually execute "pcs > cluster start gluster01" and pcsd already enabled and started. > > Gluster 3.8.11 > > Centos 7.3 latest > > Installed using CentOS Storage SIG repository > > > > -- > > Respectfully* > **Mahdi A. Mahdi* > > ------------------------------------------------------------------------ > *From:* gluster-users-bounces at gluster.org > <mailto:gluster-users-bounces at gluster.org> > <gluster-users-bounces at gluster.org > <mailto:gluster-users-bounces at gluster.org>> on behalf of Adam Ru > <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>> > *Sent:* Wednesday, May 3, 2017 12:09:58 PM > *To:* Soumya Koduri > *Cc:* gluster-users at gluster.org <mailto:gluster-users at gluster.org> > *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is > down after reboot > > Hi Soumya, > > thank you very much for your reply. > > I enabled pcsd during setup and after reboot during troubleshooting > I manually started it and checked resources (pcs status). They were > not running. I didn?t find what was wrong but I?m going to try it again. > > I?ve thoroughly checked > http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ > <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> > and I can confirm that I followed all steps with one exception. I > installed following RPMs: > glusterfs-server > glusterfs-fuse > glusterfs-cli > glusterfs-ganesha > nfs-ganesha-xfs > > and the guide referenced above specifies: > glusterfs-server > glusterfs-api > glusterfs-ganesha > > glusterfs-api is a dependency of one of RPMs that I installed so > this is not a problem. But I cannot find any mention to install > nfs-ganesha-xfs. > > I?ll try to setup the whole environment again without installing > nfs-ganesha-xfs (I assume glusterfs-ganesha has all required binaries). > > Again, thank you for you time to answer my previous message. > > Kind regards, > Adam > > On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com > <mailto:skoduri at redhat.com>> wrote: > > Hi, > > On 05/02/2017 01:34 AM, Rudolf wrote: > > Hi Gluster users, > > First, I'd like to thank you all for this amazing > open-source! Thank you! > > I'm working on home project ? three servers with Gluster and > NFS-Ganesha. My goal is to create HA NFS share with three > copies of each > file on each server. > > My systems are CentOS 7.3 Minimal install with the latest > updates and > the most current RPMs from "centos-gluster310" repository. > > I followed this tutorial: > http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/ > <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/> > (second half that describes multi-node HA setup) > > with a few exceptions: > > 1. All RPMs are from "centos-gluster310" repo that is > installed by "yum > -y install centos-release-gluster" > 2. I have three nodes (not four) with "replica 3" volume. > 3. I created empty ganesha.conf and not empty ganesha-ha.conf in > "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced > blog post is > outdated, this is now requirement) > 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this > isn't needed > anymore. > > > Please refer to > http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ > <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> > > It is being updated with latest changes happened wrt setup. > > When I finish configuration, all is good. > nfs-ganesha.service is active > and running and from client I can ping all three VIPs and I > can mount > NFS. Copied files are replicated to all nodes. > > But when I restart nodes (one by one, with 5 min. delay > between) then I > cannot ping or mount (I assume that all VIPs are down). So > my setup > definitely isn't HA. > > I found that: > # pcs status > Error: cluster is not currently running on this node > > > This means pcsd service is not up. Did you enable (systemctl > enable pcsd) pcsd service so that is comes up post reboot > automatically. If not please start it manually. > > > and nfs-ganesha.service is in inactive state. Btw. I didn't > enable > "systemctl enable nfs-ganesha" since I assume that this is > something > that Gluster does. > > > Please check /var/log/ganesha.log for any errors/warnings. > > We recommend not to enable nfs-ganesha.service (by default), as > the shared storage (where the ganesha.conf file resides now) > should be up and running before nfs-ganesha gets started. > So if enabled by default it could happen that shared_storage > mount point is not yet up and it resulted in nfs-ganesha service > failure. If you would like to address this, you could have a > cron job which keeps checking the mount point health and then > start nfs-ganesha service. > > Thanks, > Soumya > > > I assume that my issue is that I followed instructions in > blog post from > 2015/10 that are outdated. Unfortunately I cannot find > anything better ? > I spent whole day by googling. > > Would you be so kind and check the instructions in blog post > and let me > know what steps are wrong / outdated? Or please do you have > more current > instructions for Gluster+Ganesha setup? > > Thank you. > > Kind regards, > Adam > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > -- > Adam > > > > > -- > Adam