Adam Ru
2017-May-12 12:57 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
Hi Soumya, Thank you very much for last response ? very useful. I apologize for delay, I had to find time for another testing. I updated instructions that I provided in previous e-mail. *** means that the step was added. Instructions: - Clean installation of CentOS 7.3 with all updates, 3x node, resolvable IPs and VIPs - Stopped firewalld (just for testing) - *** SELinux in permissive mode (I had to, will explain bellow) - Install "centos-release-gluster" to get "centos-gluster310" repo and install following (nothing else): --- glusterfs-server --- glusterfs-ganesha - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes) - systemctl enable and start glusterd - gluster peer probe <other nodes> - gluster volume set all cluster.enable-shared-storage enable - systemctl enable and start pcsd.service - systemctl enable pacemaker.service (cannot be started at this moment) - Set password for hacluster user on all nodes - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not sure if needed) - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and insert configuration - Try list files on other nodes: ls /var/run/gluster/shared_storage/nfs-ganesha/ - gluster nfs-ganesha enable - *** systemctl enable pacemaker.service (again, since pacemaker was disabled at this point) - *** Check owner of "state", "statd", "sm" and "sm.bak" in /var/lib/nfs/ (I had to: chown rpcuser:rpcuser /var/lib/nfs/statd/state) - Check on other nodes that nfs-ganesha.service is running and "pcs status" shows started resources - gluster volume create mynewshare replica 3 transport tcp node1:/<dir> node2:/<dir> node3:/<dir> - gluster volume start mynewshare - gluster vol set mynewshare ganesha.enable on At this moment, this is status of important (I think) services: -- corosync.service disabled -- corosync-notifyd.service disabled -- glusterd.service enabled -- glusterfsd.service disabled -- pacemaker.service enabled -- pcsd.service enabled -- nfs-ganesha.service disabled -- nfs-ganesha-config.service static -- nfs-ganesha-lock.service static -- corosync.service active (running) -- corosync-notifyd.service inactive (dead) -- glusterd.service active (running) -- glusterfsd.service inactive (dead) -- pacemaker.service active (running) -- pcsd.service active (running) -- nfs-ganesha.service active (running) -- nfs-ganesha-config.service inactive (dead) -- nfs-ganesha-lock.service active (running) May I ask you a few questions please? 1. Could you please confirm that services above has correct status/state? 2. When I restart a node then nfs-ganesha is not running. Of course I cannot enable it since it needs to be enabled after shared storage is mounted. What is best practice to start it automatically so I don?t have to worry about restarting node? Should I create a script that will check whether shared storage was mounted and then start nfs-ganesha? How do you do this in production? 3. SELinux is an issue, is that a known bug? When I restart a node and start nfs-ganesha.service with SELinux in permissive mode: sudo grep 'statd' /var/log/messages May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read /var/lib/nfs/statd/state: Success May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request from mynode1.localdomain while not monitoring any hosts systemctl status nfs-ganesha-lock.service --full ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; static; vendor preset: disabled) Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s ago Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=0/SUCCESS) Main PID: 2415 (rpc.statd) CGroup: /system.slice/nfs-ganesha-lock.service ??2415 /usr/sbin/rpc.statd --no-notify May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0 starting May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read /var/lib/nfs/statd/state: Success May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM state May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status monitor for NFSv2/3 locking.. May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received SM_UNMON_ALL request from mynode1.localdomain while not monitoring any hosts When I restart a node and start nfs-ganesha.service with SELinux in enforcing mode: sudo grep 'statd' /var/log/messages May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm: Permission denied May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open /var/lib/nfs/statd/state: Permission denied systemctl status nfs-ganesha-lock.service --full ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01 UTC; 1min 21s ago Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=1/FAILURE) May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0 starting May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open directory sm: Permission denied May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service: control process exited, code=exited status=1 May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS status monitor for NFSv2/3 locking.. May 12 12:14:01 mynode1.localdomain systemd[1]: Unit nfs-ganesha-lock.service entered failed state. May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service failed. On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at redhat.com> wrote:> > > On 05/05/2017 08:04 PM, Adam Ru wrote: >> >> Hi Soumya, >> >> Thank you for the answer. >> >> Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank >> you. >> >> I spent some time by testing and I have some results. This is what I did: >> >> - Clean installation of CentOS 7.3 with all updates, 3x node, >> resolvable IPs and VIPs >> - Stopped firewalld (just for testing) >> - Install "centos-release-gluster" to get "centos-gluster310" repo and >> install following (nothing else): >> --- glusterfs-server >> --- glusterfs-ganesha >> - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem >> and secret.pem.pub on all nodes) >> - systemctl enable and start glusterd >> - gluster peer probe <other nodes> >> - gluster volume set all cluster.enable-shared-storage enable >> - systemctl enable and start pcsd.service >> - systemctl enable pacemaker.service (cannot be started at this moment) >> - Set password for hacluster user on all nodes >> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla >> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ >> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not >> sure if needed) >> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and >> insert configuration >> - Try list files on other nodes: ls >> /var/run/gluster/shared_storage/nfs-ganesha/ >> - gluster nfs-ganesha enable >> - Check on other nodes that nfs-ganesha.service is running and "pcs >> status" shows started resources >> - gluster volume create mynewshare replica 3 transport tcp node1:/<dir> >> node2:/<dir> node3:/<dir> >> - gluster volume start mynewshare >> - gluster vol set mynewshare ganesha.enable on >> >> After these steps, all VIPs are pingable and I can mount node1:/mynewshare >> >> Funny thing is that pacemaker.service is disabled again (something >> disabled it). This is status of important (I think) services: > > > yeah. We too had observed this recently. We guess probably pcs cluster setup > command first destroys existing cluster (if any) which may be disabling > pacemaker too. > >> >> systemctl list-units --all >> # corosync.service loaded active running >> # glusterd.service loaded active running >> # nfs-config.service loaded inactive dead >> # nfs-ganesha-config.service loaded inactive dead >> # nfs-ganesha-lock.service loaded active running >> # nfs-ganesha.service loaded active running >> # nfs-idmapd.service loaded inactive dead >> # nfs-mountd.service loaded inactive dead >> # nfs-server.service loaded inactive dead >> # nfs-utils.service loaded inactive dead >> # pacemaker.service loaded active running >> # pcsd.service loaded active running >> >> systemctl list-unit-files --all >> # corosync-notifyd.service disabled >> # corosync.service disabled >> # glusterd.service enabled >> # glusterfsd.service disabled >> # nfs-blkmap.service disabled >> # nfs-config.service static >> # nfs-ganesha-config.service static >> # nfs-ganesha-lock.service static >> # nfs-ganesha.service disabled >> # nfs-idmap.service static >> # nfs-idmapd.service static >> # nfs-lock.service static >> # nfs-mountd.service static >> # nfs-rquotad.service disabled >> # nfs-secure-server.service static >> # nfs-secure.service static >> # nfs-server.service disabled >> # nfs-utils.service static >> # nfs.service disabled >> # nfslock.service static >> # pacemaker.service disabled >> # pcsd.service enabled >> >> I enabled pacemaker again on all nodes and restart all nodes one by one. >> >> After reboot all VIPs are gone and I can see that nfs-ganesha.service >> isn?t running. When I start it on at least two nodes then VIPs are >> pingable again and I can mount NFS again. But there is still some issue >> in the setup because when I check nfs-ganesha-lock.service I get: >> >> systemctl -l status nfs-ganesha-lock.service >> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. >> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; >> static; vendor preset: disabled) >> Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC; >> 31min ago >> Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS >> (code=exited, status=1/FAILURE) >> >> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status >> monitor for NFSv2/3 locking.... >> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting >> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC >> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open >> directory sm: Permission denied > > > Okay this issue was fixed and the fix should be present in 3.10 too - > https://review.gluster.org/#/c/16433/ > > Please check '/var/log/messages' for statd related errors and cross-check > permissions of that directory. You could manually chown owner:group of > /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha* > services. > > Thanks, > Soumya > >> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open >> /var/lib/nfs/statd/state: Permission denied >> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service: >> control process exited, code=exited status=1 >> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status >> monitor for NFSv2/3 locking.. >> May 05 13:43:37 node0.localdomain systemd[1]: Unit >> nfs-ganesha-lock.service entered failed state. >> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service >> failed. >> >> Thank you, >> >> Kind regards, >> >> Adam >> >> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com >> <mailto:mahdi.adnan at outlook.com>> wrote: >> >> Hi, >> >> >> Same here, when i reboot the node i have to manually execute "pcs >> cluster start gluster01" and pcsd already enabled and started. >> >> Gluster 3.8.11 >> >> Centos 7.3 latest >> >> Installed using CentOS Storage SIG repository >> >> >> >> -- >> >> Respectfully* >> **Mahdi A. Mahdi* >> >> >> ------------------------------------------------------------------------ >> *From:* gluster-users-bounces at gluster.org >> <mailto:gluster-users-bounces at gluster.org> >> <gluster-users-bounces at gluster.org >> <mailto:gluster-users-bounces at gluster.org>> on behalf of Adam Ru >> <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>> >> *Sent:* Wednesday, May 3, 2017 12:09:58 PM >> *To:* Soumya Koduri >> *Cc:* gluster-users at gluster.org <mailto:gluster-users at gluster.org> >> *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is >> >> down after reboot >> >> Hi Soumya, >> >> thank you very much for your reply. >> >> I enabled pcsd during setup and after reboot during troubleshooting >> I manually started it and checked resources (pcs status). They were >> not running. I didn?t find what was wrong but I?m going to try it >> again. >> >> I?ve thoroughly checked >> >> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >> >> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> >> and I can confirm that I followed all steps with one exception. I >> installed following RPMs: >> glusterfs-server >> glusterfs-fuse >> glusterfs-cli >> glusterfs-ganesha >> nfs-ganesha-xfs >> >> and the guide referenced above specifies: >> glusterfs-server >> glusterfs-api >> glusterfs-ganesha >> >> glusterfs-api is a dependency of one of RPMs that I installed so >> this is not a problem. But I cannot find any mention to install >> nfs-ganesha-xfs. >> >> I?ll try to setup the whole environment again without installing >> nfs-ganesha-xfs (I assume glusterfs-ganesha has all required >> binaries). >> >> Again, thank you for you time to answer my previous message. >> >> Kind regards, >> Adam >> >> On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com >> <mailto:skoduri at redhat.com>> wrote: >> >> Hi, >> >> On 05/02/2017 01:34 AM, Rudolf wrote: >> >> Hi Gluster users, >> >> First, I'd like to thank you all for this amazing >> open-source! Thank you! >> >> I'm working on home project ? three servers with Gluster and >> NFS-Ganesha. My goal is to create HA NFS share with three >> copies of each >> file on each server. >> >> My systems are CentOS 7.3 Minimal install with the latest >> updates and >> the most current RPMs from "centos-gluster310" repository. >> >> I followed this tutorial: >> >> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/ >> >> <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/> >> (second half that describes multi-node HA setup) >> >> with a few exceptions: >> >> 1. All RPMs are from "centos-gluster310" repo that is >> installed by "yum >> -y install centos-release-gluster" >> 2. I have three nodes (not four) with "replica 3" volume. >> 3. I created empty ganesha.conf and not empty ganesha-ha.conf >> in >> "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced >> blog post is >> outdated, this is now requirement) >> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this >> isn't needed >> anymore. >> >> >> Please refer to >> >> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >> >> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> >> >> It is being updated with latest changes happened wrt setup. >> >> When I finish configuration, all is good. >> nfs-ganesha.service is active >> and running and from client I can ping all three VIPs and I >> can mount >> NFS. Copied files are replicated to all nodes. >> >> But when I restart nodes (one by one, with 5 min. delay >> between) then I >> cannot ping or mount (I assume that all VIPs are down). So >> my setup >> definitely isn't HA. >> >> I found that: >> # pcs status >> Error: cluster is not currently running on this node >> >> >> This means pcsd service is not up. Did you enable (systemctl >> enable pcsd) pcsd service so that is comes up post reboot >> automatically. If not please start it manually. >> >> >> and nfs-ganesha.service is in inactive state. Btw. I didn't >> enable >> "systemctl enable nfs-ganesha" since I assume that this is >> something >> that Gluster does. >> >> >> Please check /var/log/ganesha.log for any errors/warnings. >> >> We recommend not to enable nfs-ganesha.service (by default), as >> the shared storage (where the ganesha.conf file resides now) >> should be up and running before nfs-ganesha gets started. >> So if enabled by default it could happen that shared_storage >> mount point is not yet up and it resulted in nfs-ganesha service >> failure. If you would like to address this, you could have a >> cron job which keeps checking the mount point health and then >> start nfs-ganesha service. >> >> Thanks, >> Soumya >> >> >> I assume that my issue is that I followed instructions in >> blog post from >> 2015/10 that are outdated. Unfortunately I cannot find >> anything better ? >> I spent whole day by googling. >> >> Would you be so kind and check the instructions in blog post >> and let me >> know what steps are wrong / outdated? Or please do you have >> more current >> instructions for Gluster+Ganesha setup? >> >> Thank you. >> >> Kind regards, >> Adam >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> >> >> >> >> >> -- >> Adam >> >> >> >> >> -- >> Adam-- Adam
Soumya Koduri
2017-May-15 10:56 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
On 05/12/2017 06:27 PM, Adam Ru wrote:> Hi Soumya, > > Thank you very much for last response ? very useful. > > I apologize for delay, I had to find time for another testing. > > I updated instructions that I provided in previous e-mail. *** means > that the step was added. > > Instructions: > - Clean installation of CentOS 7.3 with all updates, 3x node, > resolvable IPs and VIPs > - Stopped firewalld (just for testing) > - *** SELinux in permissive mode (I had to, will explain bellow) > - Install "centos-release-gluster" to get "centos-gluster310" repo > and install following (nothing else): > --- glusterfs-server > --- glusterfs-ganesha > - Passwordless SSH between all nodes > (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes) > - systemctl enable and start glusterd > - gluster peer probe <other nodes> > - gluster volume set all cluster.enable-shared-storage enable > - systemctl enable and start pcsd.service > - systemctl enable pacemaker.service (cannot be started at this moment) > - Set password for hacluster user on all nodes > - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla > - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ > - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not > sure if needed) > - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and > insert configuration > - Try list files on other nodes: ls > /var/run/gluster/shared_storage/nfs-ganesha/ > - gluster nfs-ganesha enable > - *** systemctl enable pacemaker.service (again, since pacemaker was > disabled at this point) > - *** Check owner of "state", "statd", "sm" and "sm.bak" in > /var/lib/nfs/ (I had to: chown rpcuser:rpcuser > /var/lib/nfs/statd/state) > - Check on other nodes that nfs-ganesha.service is running and "pcs > status" shows started resources > - gluster volume create mynewshare replica 3 transport tcp > node1:/<dir> node2:/<dir> node3:/<dir> > - gluster volume start mynewshare > - gluster vol set mynewshare ganesha.enable on > > At this moment, this is status of important (I think) services: > > -- corosync.service disabled > -- corosync-notifyd.service disabled > -- glusterd.service enabled > -- glusterfsd.service disabled > -- pacemaker.service enabled > -- pcsd.service enabled > -- nfs-ganesha.service disabled > -- nfs-ganesha-config.service static > -- nfs-ganesha-lock.service static > > -- corosync.service active (running) > -- corosync-notifyd.service inactive (dead) > -- glusterd.service active (running) > -- glusterfsd.service inactive (dead) > -- pacemaker.service active (running) > -- pcsd.service active (running) > -- nfs-ganesha.service active (running) > -- nfs-ganesha-config.service inactive (dead) > -- nfs-ganesha-lock.service active (running) > > May I ask you a few questions please? > > 1. Could you please confirm that services above has correct status/state?Looks good to the best of my knowledge.> > 2. When I restart a node then nfs-ganesha is not running. Of course I > cannot enable it since it needs to be enabled after shared storage is > mounted. What is best practice to start it automatically so I don?t > have to worry about restarting node? Should I create a script that > will check whether shared storage was mounted and then start > nfs-ganesha? How do you do this in production?That's right.. We have plans to address this in near future (probably by having a new .service which mounts shared_storage before starting nfs-ganesha). But until then ..yes having a custom defined script to do so is the only way to automate it.> > 3. SELinux is an issue, is that a known bug? > > When I restart a node and start nfs-ganesha.service with SELinux in > permissive mode: > > sudo grep 'statd' /var/log/messages > May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting > May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC > May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read > /var/lib/nfs/statd/state: Success > May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state > May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request > from mynode1.localdomain while not monitoring any hosts > > systemctl status nfs-ganesha-lock.service --full > ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. > Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; > static; vendor preset: disabled) > Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s ago > Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS > (code=exited, status=0/SUCCESS) > Main PID: 2415 (rpc.statd) > CGroup: /system.slice/nfs-ganesha-lock.service > ??2415 /usr/sbin/rpc.statd --no-notify > > May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status > monitor for NFSv2/3 locking.... > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0 starting > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read > /var/lib/nfs/statd/state: Success > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM state > May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status > monitor for NFSv2/3 locking.. > May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received > SM_UNMON_ALL request from mynode1.localdomain while not monitoring any > hosts > > > When I restart a node and start nfs-ganesha.service with SELinux in > enforcing mode: > > > sudo grep 'statd' /var/log/messages > May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting > May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC > May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm: > Permission denied > May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open > /var/lib/nfs/statd/state: Permission denied > > systemctl status nfs-ganesha-lock.service --full > ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. > Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; > static; vendor preset: disabled) > Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01 > UTC; 1min 21s ago > Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS > (code=exited, status=1/FAILURE) > > May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status > monitor for NFSv2/3 locking.... > May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0 starting > May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC > May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open > directory sm: Permission denied > May 12 12:14:01 mynode1.localdomain systemd[1]: > nfs-ganesha-lock.service: control process exited, code=exited status=1 > May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS > status monitor for NFSv2/3 locking.. > May 12 12:14:01 mynode1.localdomain systemd[1]: Unit > nfs-ganesha-lock.service entered failed state. > May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service failed.Cant remember right now. Could you please paste the AVCs you get, and se-linux packages version. Or preferably please file a bug. We can get the details verified from selinux members. Thanks, Soumya> > On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at redhat.com> wrote: >> >> >> On 05/05/2017 08:04 PM, Adam Ru wrote: >>> >>> Hi Soumya, >>> >>> Thank you for the answer. >>> >>> Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank >>> you. >>> >>> I spent some time by testing and I have some results. This is what I did: >>> >>> - Clean installation of CentOS 7.3 with all updates, 3x node, >>> resolvable IPs and VIPs >>> - Stopped firewalld (just for testing) >>> - Install "centos-release-gluster" to get "centos-gluster310" repo and >>> install following (nothing else): >>> --- glusterfs-server >>> --- glusterfs-ganesha >>> - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem >>> and secret.pem.pub on all nodes) >>> - systemctl enable and start glusterd >>> - gluster peer probe <other nodes> >>> - gluster volume set all cluster.enable-shared-storage enable >>> - systemctl enable and start pcsd.service >>> - systemctl enable pacemaker.service (cannot be started at this moment) >>> - Set password for hacluster user on all nodes >>> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla >>> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ >>> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not >>> sure if needed) >>> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and >>> insert configuration >>> - Try list files on other nodes: ls >>> /var/run/gluster/shared_storage/nfs-ganesha/ >>> - gluster nfs-ganesha enable >>> - Check on other nodes that nfs-ganesha.service is running and "pcs >>> status" shows started resources >>> - gluster volume create mynewshare replica 3 transport tcp node1:/<dir> >>> node2:/<dir> node3:/<dir> >>> - gluster volume start mynewshare >>> - gluster vol set mynewshare ganesha.enable on >>> >>> After these steps, all VIPs are pingable and I can mount node1:/mynewshare >>> >>> Funny thing is that pacemaker.service is disabled again (something >>> disabled it). This is status of important (I think) services: >> >> >> yeah. We too had observed this recently. We guess probably pcs cluster setup >> command first destroys existing cluster (if any) which may be disabling >> pacemaker too. >> >>> >>> systemctl list-units --all >>> # corosync.service loaded active running >>> # glusterd.service loaded active running >>> # nfs-config.service loaded inactive dead >>> # nfs-ganesha-config.service loaded inactive dead >>> # nfs-ganesha-lock.service loaded active running >>> # nfs-ganesha.service loaded active running >>> # nfs-idmapd.service loaded inactive dead >>> # nfs-mountd.service loaded inactive dead >>> # nfs-server.service loaded inactive dead >>> # nfs-utils.service loaded inactive dead >>> # pacemaker.service loaded active running >>> # pcsd.service loaded active running >>> >>> systemctl list-unit-files --all >>> # corosync-notifyd.service disabled >>> # corosync.service disabled >>> # glusterd.service enabled >>> # glusterfsd.service disabled >>> # nfs-blkmap.service disabled >>> # nfs-config.service static >>> # nfs-ganesha-config.service static >>> # nfs-ganesha-lock.service static >>> # nfs-ganesha.service disabled >>> # nfs-idmap.service static >>> # nfs-idmapd.service static >>> # nfs-lock.service static >>> # nfs-mountd.service static >>> # nfs-rquotad.service disabled >>> # nfs-secure-server.service static >>> # nfs-secure.service static >>> # nfs-server.service disabled >>> # nfs-utils.service static >>> # nfs.service disabled >>> # nfslock.service static >>> # pacemaker.service disabled >>> # pcsd.service enabled >>> >>> I enabled pacemaker again on all nodes and restart all nodes one by one. >>> >>> After reboot all VIPs are gone and I can see that nfs-ganesha.service >>> isn?t running. When I start it on at least two nodes then VIPs are >>> pingable again and I can mount NFS again. But there is still some issue >>> in the setup because when I check nfs-ganesha-lock.service I get: >>> >>> systemctl -l status nfs-ganesha-lock.service >>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. >>> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; >>> static; vendor preset: disabled) >>> Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC; >>> 31min ago >>> Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS >>> (code=exited, status=1/FAILURE) >>> >>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status >>> monitor for NFSv2/3 locking.... >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open >>> directory sm: Permission denied >> >> >> Okay this issue was fixed and the fix should be present in 3.10 too - >> https://review.gluster.org/#/c/16433/ >> >> Please check '/var/log/messages' for statd related errors and cross-check >> permissions of that directory. You could manually chown owner:group of >> /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha* >> services. >> >> Thanks, >> Soumya >> >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open >>> /var/lib/nfs/statd/state: Permission denied >>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service: >>> control process exited, code=exited status=1 >>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status >>> monitor for NFSv2/3 locking.. >>> May 05 13:43:37 node0.localdomain systemd[1]: Unit >>> nfs-ganesha-lock.service entered failed state. >>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service >>> failed. >>> >>> Thank you, >>> >>> Kind regards, >>> >>> Adam >>> >>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com >>> <mailto:mahdi.adnan at outlook.com>> wrote: >>> >>> Hi, >>> >>> >>> Same here, when i reboot the node i have to manually execute "pcs >>> cluster start gluster01" and pcsd already enabled and started. >>> >>> Gluster 3.8.11 >>> >>> Centos 7.3 latest >>> >>> Installed using CentOS Storage SIG repository >>> >>> >>> >>> -- >>> >>> Respectfully* >>> **Mahdi A. Mahdi* >>> >>> >>> ------------------------------------------------------------------------ >>> *From:* gluster-users-bounces at gluster.org >>> <mailto:gluster-users-bounces at gluster.org> >>> <gluster-users-bounces at gluster.org >>> <mailto:gluster-users-bounces at gluster.org>> on behalf of Adam Ru >>> <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>> >>> *Sent:* Wednesday, May 3, 2017 12:09:58 PM >>> *To:* Soumya Koduri >>> *Cc:* gluster-users at gluster.org <mailto:gluster-users at gluster.org> >>> *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is >>> >>> down after reboot >>> >>> Hi Soumya, >>> >>> thank you very much for your reply. >>> >>> I enabled pcsd during setup and after reboot during troubleshooting >>> I manually started it and checked resources (pcs status). They were >>> not running. I didn?t find what was wrong but I?m going to try it >>> again. >>> >>> I?ve thoroughly checked >>> >>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >>> >>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> >>> and I can confirm that I followed all steps with one exception. I >>> installed following RPMs: >>> glusterfs-server >>> glusterfs-fuse >>> glusterfs-cli >>> glusterfs-ganesha >>> nfs-ganesha-xfs >>> >>> and the guide referenced above specifies: >>> glusterfs-server >>> glusterfs-api >>> glusterfs-ganesha >>> >>> glusterfs-api is a dependency of one of RPMs that I installed so >>> this is not a problem. But I cannot find any mention to install >>> nfs-ganesha-xfs. >>> >>> I?ll try to setup the whole environment again without installing >>> nfs-ganesha-xfs (I assume glusterfs-ganesha has all required >>> binaries). >>> >>> Again, thank you for you time to answer my previous message. >>> >>> Kind regards, >>> Adam >>> >>> On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com >>> <mailto:skoduri at redhat.com>> wrote: >>> >>> Hi, >>> >>> On 05/02/2017 01:34 AM, Rudolf wrote: >>> >>> Hi Gluster users, >>> >>> First, I'd like to thank you all for this amazing >>> open-source! Thank you! >>> >>> I'm working on home project ? three servers with Gluster and >>> NFS-Ganesha. My goal is to create HA NFS share with three >>> copies of each >>> file on each server. >>> >>> My systems are CentOS 7.3 Minimal install with the latest >>> updates and >>> the most current RPMs from "centos-gluster310" repository. >>> >>> I followed this tutorial: >>> >>> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/ >>> >>> <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/> >>> (second half that describes multi-node HA setup) >>> >>> with a few exceptions: >>> >>> 1. All RPMs are from "centos-gluster310" repo that is >>> installed by "yum >>> -y install centos-release-gluster" >>> 2. I have three nodes (not four) with "replica 3" volume. >>> 3. I created empty ganesha.conf and not empty ganesha-ha.conf >>> in >>> "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced >>> blog post is >>> outdated, this is now requirement) >>> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this >>> isn't needed >>> anymore. >>> >>> >>> Please refer to >>> >>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >>> >>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> >>> >>> It is being updated with latest changes happened wrt setup. >>> >>> When I finish configuration, all is good. >>> nfs-ganesha.service is active >>> and running and from client I can ping all three VIPs and I >>> can mount >>> NFS. Copied files are replicated to all nodes. >>> >>> But when I restart nodes (one by one, with 5 min. delay >>> between) then I >>> cannot ping or mount (I assume that all VIPs are down). So >>> my setup >>> definitely isn't HA. >>> >>> I found that: >>> # pcs status >>> Error: cluster is not currently running on this node >>> >>> >>> This means pcsd service is not up. Did you enable (systemctl >>> enable pcsd) pcsd service so that is comes up post reboot >>> automatically. If not please start it manually. >>> >>> >>> and nfs-ganesha.service is in inactive state. Btw. I didn't >>> enable >>> "systemctl enable nfs-ganesha" since I assume that this is >>> something >>> that Gluster does. >>> >>> >>> Please check /var/log/ganesha.log for any errors/warnings. >>> >>> We recommend not to enable nfs-ganesha.service (by default), as >>> the shared storage (where the ganesha.conf file resides now) >>> should be up and running before nfs-ganesha gets started. >>> So if enabled by default it could happen that shared_storage >>> mount point is not yet up and it resulted in nfs-ganesha service >>> failure. If you would like to address this, you could have a >>> cron job which keeps checking the mount point health and then >>> start nfs-ganesha service. >>> >>> Thanks, >>> Soumya >>> >>> >>> I assume that my issue is that I followed instructions in >>> blog post from >>> 2015/10 that are outdated. Unfortunately I cannot find >>> anything better ? >>> I spent whole day by googling. >>> >>> Would you be so kind and check the instructions in blog post >>> and let me >>> know what steps are wrong / outdated? Or please do you have >>> more current >>> instructions for Gluster+Ganesha setup? >>> >>> Thank you. >>> >>> Kind regards, >>> Adam >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> <http://lists.gluster.org/mailman/listinfo/gluster-users> >>> >>> >>> >>> >>> -- >>> Adam >>> >>> >>> >>> >>> -- >>> Adam > > >