Soumya Koduri
2017-May-15  10:56 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
On 05/12/2017 06:27 PM, Adam Ru wrote:> Hi Soumya, > > Thank you very much for last response ? very useful. > > I apologize for delay, I had to find time for another testing. > > I updated instructions that I provided in previous e-mail. *** means > that the step was added. > > Instructions: > - Clean installation of CentOS 7.3 with all updates, 3x node, > resolvable IPs and VIPs > - Stopped firewalld (just for testing) > - *** SELinux in permissive mode (I had to, will explain bellow) > - Install "centos-release-gluster" to get "centos-gluster310" repo > and install following (nothing else): > --- glusterfs-server > --- glusterfs-ganesha > - Passwordless SSH between all nodes > (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes) > - systemctl enable and start glusterd > - gluster peer probe <other nodes> > - gluster volume set all cluster.enable-shared-storage enable > - systemctl enable and start pcsd.service > - systemctl enable pacemaker.service (cannot be started at this moment) > - Set password for hacluster user on all nodes > - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla > - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ > - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not > sure if needed) > - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and > insert configuration > - Try list files on other nodes: ls > /var/run/gluster/shared_storage/nfs-ganesha/ > - gluster nfs-ganesha enable > - *** systemctl enable pacemaker.service (again, since pacemaker was > disabled at this point) > - *** Check owner of "state", "statd", "sm" and "sm.bak" in > /var/lib/nfs/ (I had to: chown rpcuser:rpcuser > /var/lib/nfs/statd/state) > - Check on other nodes that nfs-ganesha.service is running and "pcs > status" shows started resources > - gluster volume create mynewshare replica 3 transport tcp > node1:/<dir> node2:/<dir> node3:/<dir> > - gluster volume start mynewshare > - gluster vol set mynewshare ganesha.enable on > > At this moment, this is status of important (I think) services: > > -- corosync.service disabled > -- corosync-notifyd.service disabled > -- glusterd.service enabled > -- glusterfsd.service disabled > -- pacemaker.service enabled > -- pcsd.service enabled > -- nfs-ganesha.service disabled > -- nfs-ganesha-config.service static > -- nfs-ganesha-lock.service static > > -- corosync.service active (running) > -- corosync-notifyd.service inactive (dead) > -- glusterd.service active (running) > -- glusterfsd.service inactive (dead) > -- pacemaker.service active (running) > -- pcsd.service active (running) > -- nfs-ganesha.service active (running) > -- nfs-ganesha-config.service inactive (dead) > -- nfs-ganesha-lock.service active (running) > > May I ask you a few questions please? > > 1. Could you please confirm that services above has correct status/state?Looks good to the best of my knowledge.> > 2. When I restart a node then nfs-ganesha is not running. Of course I > cannot enable it since it needs to be enabled after shared storage is > mounted. What is best practice to start it automatically so I don?t > have to worry about restarting node? Should I create a script that > will check whether shared storage was mounted and then start > nfs-ganesha? How do you do this in production?That's right.. We have plans to address this in near future (probably by having a new .service which mounts shared_storage before starting nfs-ganesha). But until then ..yes having a custom defined script to do so is the only way to automate it.> > 3. SELinux is an issue, is that a known bug? > > When I restart a node and start nfs-ganesha.service with SELinux in > permissive mode: > > sudo grep 'statd' /var/log/messages > May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting > May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC > May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read > /var/lib/nfs/statd/state: Success > May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state > May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request > from mynode1.localdomain while not monitoring any hosts > > systemctl status nfs-ganesha-lock.service --full > ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. > Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; > static; vendor preset: disabled) > Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s ago > Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS > (code=exited, status=0/SUCCESS) > Main PID: 2415 (rpc.statd) > CGroup: /system.slice/nfs-ganesha-lock.service > ??2415 /usr/sbin/rpc.statd --no-notify > > May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status > monitor for NFSv2/3 locking.... > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0 starting > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read > /var/lib/nfs/statd/state: Success > May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM state > May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status > monitor for NFSv2/3 locking.. > May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received > SM_UNMON_ALL request from mynode1.localdomain while not monitoring any > hosts > > > When I restart a node and start nfs-ganesha.service with SELinux in > enforcing mode: > > > sudo grep 'statd' /var/log/messages > May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting > May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC > May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm: > Permission denied > May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open > /var/lib/nfs/statd/state: Permission denied > > systemctl status nfs-ganesha-lock.service --full > ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. > Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; > static; vendor preset: disabled) > Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01 > UTC; 1min 21s ago > Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS > (code=exited, status=1/FAILURE) > > May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status > monitor for NFSv2/3 locking.... > May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0 starting > May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC > May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open > directory sm: Permission denied > May 12 12:14:01 mynode1.localdomain systemd[1]: > nfs-ganesha-lock.service: control process exited, code=exited status=1 > May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS > status monitor for NFSv2/3 locking.. > May 12 12:14:01 mynode1.localdomain systemd[1]: Unit > nfs-ganesha-lock.service entered failed state. > May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service failed.Cant remember right now. Could you please paste the AVCs you get, and se-linux packages version. Or preferably please file a bug. We can get the details verified from selinux members. Thanks, Soumya> > On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at redhat.com> wrote: >> >> >> On 05/05/2017 08:04 PM, Adam Ru wrote: >>> >>> Hi Soumya, >>> >>> Thank you for the answer. >>> >>> Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank >>> you. >>> >>> I spent some time by testing and I have some results. This is what I did: >>> >>> - Clean installation of CentOS 7.3 with all updates, 3x node, >>> resolvable IPs and VIPs >>> - Stopped firewalld (just for testing) >>> - Install "centos-release-gluster" to get "centos-gluster310" repo and >>> install following (nothing else): >>> --- glusterfs-server >>> --- glusterfs-ganesha >>> - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem >>> and secret.pem.pub on all nodes) >>> - systemctl enable and start glusterd >>> - gluster peer probe <other nodes> >>> - gluster volume set all cluster.enable-shared-storage enable >>> - systemctl enable and start pcsd.service >>> - systemctl enable pacemaker.service (cannot be started at this moment) >>> - Set password for hacluster user on all nodes >>> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla >>> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ >>> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not >>> sure if needed) >>> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and >>> insert configuration >>> - Try list files on other nodes: ls >>> /var/run/gluster/shared_storage/nfs-ganesha/ >>> - gluster nfs-ganesha enable >>> - Check on other nodes that nfs-ganesha.service is running and "pcs >>> status" shows started resources >>> - gluster volume create mynewshare replica 3 transport tcp node1:/<dir> >>> node2:/<dir> node3:/<dir> >>> - gluster volume start mynewshare >>> - gluster vol set mynewshare ganesha.enable on >>> >>> After these steps, all VIPs are pingable and I can mount node1:/mynewshare >>> >>> Funny thing is that pacemaker.service is disabled again (something >>> disabled it). This is status of important (I think) services: >> >> >> yeah. We too had observed this recently. We guess probably pcs cluster setup >> command first destroys existing cluster (if any) which may be disabling >> pacemaker too. >> >>> >>> systemctl list-units --all >>> # corosync.service loaded active running >>> # glusterd.service loaded active running >>> # nfs-config.service loaded inactive dead >>> # nfs-ganesha-config.service loaded inactive dead >>> # nfs-ganesha-lock.service loaded active running >>> # nfs-ganesha.service loaded active running >>> # nfs-idmapd.service loaded inactive dead >>> # nfs-mountd.service loaded inactive dead >>> # nfs-server.service loaded inactive dead >>> # nfs-utils.service loaded inactive dead >>> # pacemaker.service loaded active running >>> # pcsd.service loaded active running >>> >>> systemctl list-unit-files --all >>> # corosync-notifyd.service disabled >>> # corosync.service disabled >>> # glusterd.service enabled >>> # glusterfsd.service disabled >>> # nfs-blkmap.service disabled >>> # nfs-config.service static >>> # nfs-ganesha-config.service static >>> # nfs-ganesha-lock.service static >>> # nfs-ganesha.service disabled >>> # nfs-idmap.service static >>> # nfs-idmapd.service static >>> # nfs-lock.service static >>> # nfs-mountd.service static >>> # nfs-rquotad.service disabled >>> # nfs-secure-server.service static >>> # nfs-secure.service static >>> # nfs-server.service disabled >>> # nfs-utils.service static >>> # nfs.service disabled >>> # nfslock.service static >>> # pacemaker.service disabled >>> # pcsd.service enabled >>> >>> I enabled pacemaker again on all nodes and restart all nodes one by one. >>> >>> After reboot all VIPs are gone and I can see that nfs-ganesha.service >>> isn?t running. When I start it on at least two nodes then VIPs are >>> pingable again and I can mount NFS again. But there is still some issue >>> in the setup because when I check nfs-ganesha-lock.service I get: >>> >>> systemctl -l status nfs-ganesha-lock.service >>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking. >>> Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service; >>> static; vendor preset: disabled) >>> Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC; >>> 31min ago >>> Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS >>> (code=exited, status=1/FAILURE) >>> >>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status >>> monitor for NFSv2/3 locking.... >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open >>> directory sm: Permission denied >> >> >> Okay this issue was fixed and the fix should be present in 3.10 too - >> https://review.gluster.org/#/c/16433/ >> >> Please check '/var/log/messages' for statd related errors and cross-check >> permissions of that directory. You could manually chown owner:group of >> /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha* >> services. >> >> Thanks, >> Soumya >> >>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open >>> /var/lib/nfs/statd/state: Permission denied >>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service: >>> control process exited, code=exited status=1 >>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status >>> monitor for NFSv2/3 locking.. >>> May 05 13:43:37 node0.localdomain systemd[1]: Unit >>> nfs-ganesha-lock.service entered failed state. >>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service >>> failed. >>> >>> Thank you, >>> >>> Kind regards, >>> >>> Adam >>> >>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com >>> <mailto:mahdi.adnan at outlook.com>> wrote: >>> >>> Hi, >>> >>> >>> Same here, when i reboot the node i have to manually execute "pcs >>> cluster start gluster01" and pcsd already enabled and started. >>> >>> Gluster 3.8.11 >>> >>> Centos 7.3 latest >>> >>> Installed using CentOS Storage SIG repository >>> >>> >>> >>> -- >>> >>> Respectfully* >>> **Mahdi A. Mahdi* >>> >>> >>> ------------------------------------------------------------------------ >>> *From:* gluster-users-bounces at gluster.org >>> <mailto:gluster-users-bounces at gluster.org> >>> <gluster-users-bounces at gluster.org >>> <mailto:gluster-users-bounces at gluster.org>> on behalf of Adam Ru >>> <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>> >>> *Sent:* Wednesday, May 3, 2017 12:09:58 PM >>> *To:* Soumya Koduri >>> *Cc:* gluster-users at gluster.org <mailto:gluster-users at gluster.org> >>> *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is >>> >>> down after reboot >>> >>> Hi Soumya, >>> >>> thank you very much for your reply. >>> >>> I enabled pcsd during setup and after reboot during troubleshooting >>> I manually started it and checked resources (pcs status). They were >>> not running. I didn?t find what was wrong but I?m going to try it >>> again. >>> >>> I?ve thoroughly checked >>> >>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >>> >>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> >>> and I can confirm that I followed all steps with one exception. I >>> installed following RPMs: >>> glusterfs-server >>> glusterfs-fuse >>> glusterfs-cli >>> glusterfs-ganesha >>> nfs-ganesha-xfs >>> >>> and the guide referenced above specifies: >>> glusterfs-server >>> glusterfs-api >>> glusterfs-ganesha >>> >>> glusterfs-api is a dependency of one of RPMs that I installed so >>> this is not a problem. But I cannot find any mention to install >>> nfs-ganesha-xfs. >>> >>> I?ll try to setup the whole environment again without installing >>> nfs-ganesha-xfs (I assume glusterfs-ganesha has all required >>> binaries). >>> >>> Again, thank you for you time to answer my previous message. >>> >>> Kind regards, >>> Adam >>> >>> On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com >>> <mailto:skoduri at redhat.com>> wrote: >>> >>> Hi, >>> >>> On 05/02/2017 01:34 AM, Rudolf wrote: >>> >>> Hi Gluster users, >>> >>> First, I'd like to thank you all for this amazing >>> open-source! Thank you! >>> >>> I'm working on home project ? three servers with Gluster and >>> NFS-Ganesha. My goal is to create HA NFS share with three >>> copies of each >>> file on each server. >>> >>> My systems are CentOS 7.3 Minimal install with the latest >>> updates and >>> the most current RPMs from "centos-gluster310" repository. >>> >>> I followed this tutorial: >>> >>> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/ >>> >>> <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/> >>> (second half that describes multi-node HA setup) >>> >>> with a few exceptions: >>> >>> 1. All RPMs are from "centos-gluster310" repo that is >>> installed by "yum >>> -y install centos-release-gluster" >>> 2. I have three nodes (not four) with "replica 3" volume. >>> 3. I created empty ganesha.conf and not empty ganesha-ha.conf >>> in >>> "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced >>> blog post is >>> outdated, this is now requirement) >>> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this >>> isn't needed >>> anymore. >>> >>> >>> Please refer to >>> >>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/ >>> >>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/> >>> >>> It is being updated with latest changes happened wrt setup. >>> >>> When I finish configuration, all is good. >>> nfs-ganesha.service is active >>> and running and from client I can ping all three VIPs and I >>> can mount >>> NFS. Copied files are replicated to all nodes. >>> >>> But when I restart nodes (one by one, with 5 min. delay >>> between) then I >>> cannot ping or mount (I assume that all VIPs are down). So >>> my setup >>> definitely isn't HA. >>> >>> I found that: >>> # pcs status >>> Error: cluster is not currently running on this node >>> >>> >>> This means pcsd service is not up. Did you enable (systemctl >>> enable pcsd) pcsd service so that is comes up post reboot >>> automatically. If not please start it manually. >>> >>> >>> and nfs-ganesha.service is in inactive state. Btw. I didn't >>> enable >>> "systemctl enable nfs-ganesha" since I assume that this is >>> something >>> that Gluster does. >>> >>> >>> Please check /var/log/ganesha.log for any errors/warnings. >>> >>> We recommend not to enable nfs-ganesha.service (by default), as >>> the shared storage (where the ganesha.conf file resides now) >>> should be up and running before nfs-ganesha gets started. >>> So if enabled by default it could happen that shared_storage >>> mount point is not yet up and it resulted in nfs-ganesha service >>> failure. If you would like to address this, you could have a >>> cron job which keeps checking the mount point health and then >>> start nfs-ganesha service. >>> >>> Thanks, >>> Soumya >>> >>> >>> I assume that my issue is that I followed instructions in >>> blog post from >>> 2015/10 that are outdated. Unfortunately I cannot find >>> anything better ? >>> I spent whole day by googling. >>> >>> Would you be so kind and check the instructions in blog post >>> and let me >>> know what steps are wrong / outdated? Or please do you have >>> more current >>> instructions for Gluster+Ganesha setup? >>> >>> Thank you. >>> >>> Kind regards, >>> Adam >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> <http://lists.gluster.org/mailman/listinfo/gluster-users> >>> >>> >>> >>> >>> -- >>> Adam >>> >>> >>> >>> >>> -- >>> Adam > > >
hvjunk
2017-May-15  11:27 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
> On 15 May 2017, at 12:56 PM, Soumya Koduri <skoduri at redhat.com> wrote: > > > > On 05/12/2017 06:27 PM, Adam Ru wrote: >> Hi Soumya, >> >> Thank you very much for last response ? very useful. >> >> I apologize for delay, I had to find time for another testing. >> >> I updated instructions that I provided in previous e-mail. *** means >> that the step was added. >> >> Instructions: >> - Clean installation of CentOS 7.3 with all updates, 3x node, >> resolvable IPs and VIPs >> - Stopped firewalld (just for testing) >> - *** SELinux in permissive mode (I had to, will explain bellow) >> - Install ?centos-release-gluster" to get "centos-gluster310" reposhould I also install the centos-gluster310, or will that be automagically chosen by the centos-release-gluster?>> and install following (nothing else): >> --- glusterfs-server >> --- glusterfs-ganesha >> - Passwordless SSH between all nodes >> (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes) >> - systemctl enable and start glusterd >> - gluster peer probe <other nodes> >> - gluster volume set all cluster.enable-shared-storage enableAfter this step, I?ll advise (given my experience in doing this by Ansible) to make sure that the shared filesystem have propagated to all the nodes, as well as the needed entries made in fstab? safety check, and I?ll also load my systemd service and helper script to assist in cluster cold-bootstrapping.>> - systemctl enable and start pcsd.service >> - systemctl enable pacemaker.service (cannot be started at this moment) >> - Set password for hacluster user on all nodes >> - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla >> - mkdir /var/run/gluster/shared_storage/nfs-ganesha/ >> - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not >> sure if needed) >> - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and >> insert configuration >> - Try list files on other nodes: ls >> /var/run/gluster/shared_storage/nfs-ganesha/ >> - gluster nfs-ganesha enable >> - *** systemctl enable pacemaker.service (again, since pacemaker was >> disabled at this point) >> - *** Check owner of "state", "statd", "sm" and "sm.bak" in >> /var/lib/nfs/ (I had to: chown rpcuser:rpcuser >> /var/lib/nfs/statd/state) >> - Check on other nodes that nfs-ganesha.service is running and "pcs >> status" shows started resources >> - gluster volume create mynewshare replica 3 transport tcp >> node1:/<dir> node2:/<dir> node3:/<dir> >> - gluster volume start mynewshare >> - gluster vol set mynewshare ganesha.enable on >> >> At this moment, this is status of important (I think) services: >> >> -- corosync.service disabled >> -- corosync-notifyd.service disabled >> -- glusterd.service enabled >> -- glusterfsd.service disabled >> -- pacemaker.service enabled >> -- pcsd.service enabled >> -- nfs-ganesha.service disabled >> -- nfs-ganesha-config.service static >> -- nfs-ganesha-lock.service static >> >> -- corosync.service active (running) >> -- corosync-notifyd.service inactive (dead) >> -- glusterd.service active (running) >> -- glusterfsd.service inactive (dead) >> -- pacemaker.service active (running) >> -- pcsd.service active (running) >> -- nfs-ganesha.service active (running) >> -- nfs-ganesha-config.service inactive (dead) >> -- nfs-ganesha-lock.service active (running) >> >> May I ask you a few questions please? >> >> 1. Could you please confirm that services above has correct status/state? > > Looks good to the best of my knowledge. > >> >> 2. When I restart a node then nfs-ganesha is not running. Of course I >> cannot enable it since it needs to be enabled after shared storage is >> mounted. What is best practice to start it automatically so I don?t >> have to worry about restarting node? Should I create a script that >> will check whether shared storage was mounted and then start >> nfs-ganesha? How do you do this in production? > > That's right.. We have plans to address this in near future (probably by having a new .service which mounts shared_storage before starting nfs-ganesha). But until then ..yes having a custom defined script to do so is the only way to automate it.Refer to my previous posting that has a script & systemd service that address this problematic bootstrapping issue w.r.t. locally mounted gluster directories, which the shared directory is. That could be used (with my permission) as a basis to help fix this issue?
Adam Ru
2017-May-28  13:37 UTC
[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot
Hi Soumya,
again I apologize for delay in response. I'll try to file a bug.
Meantime I'm sending AVCs and version number. AVC are collected
between two reboots, in both cases I manually started
nfs-ganesha.service and nfs-ganesha-lock.service failed to start.
uname -r
3.10.0-514.21.1.el7.x86_64
sestatus -v
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28
Process contexts:
Current context:
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Init context:                   system_u:system_r:init_t:s0
File contexts:
Controlling terminal:           unconfined_u:object_r:user_tty_device_t:s0
/etc/passwd                     system_u:object_r:passwd_file_t:s0
/etc/shadow                     system_u:object_r:shadow_t:s0
/bin/bash                       system_u:object_r:shell_exec_t:s0
/bin/login                      system_u:object_r:login_exec_t:s0
/bin/sh                         system_u:object_r:bin_t:s0 ->
system_u:object_r:shell_exec_t:s0
/sbin/agetty                    system_u:object_r:getty_exec_t:s0
/sbin/init                      system_u:object_r:bin_t:s0 ->
system_u:object_r:init_exec_t:s0
/usr/sbin/sshd                  system_u:object_r:sshd_exec_t:s0
sudo systemctl start nfs-ganesha.service
systemctl status -l nfs-ganesha-lock.service
? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sun 2017-05-28 14:12:48 UTC; 9s ago
  Process: 1991 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
(code=exited, status=1/FAILURE)
mynode0.localdomain systemd[1]: Starting NFS status monitor for
NFSv2/3 locking....
mynode0.localdomain rpc.statd[1992]: Version 1.3.0 starting
mynode0.localdomain rpc.statd[1992]: Flags: TI-RPC
mynode0.localdomain rpc.statd[1992]: Failed to open directory sm:
Permission denied
mynode0.localdomain systemd[1]: nfs-ganesha-lock.service: control
process exited, code=exited status=1
mynode0.localdomain systemd[1]: Failed to start NFS status monitor for
NFSv2/3 locking..
mynode0.localdomain systemd[1]: Unit nfs-ganesha-lock.service entered
failed state.
mynode0.localdomain systemd[1]: nfs-ganesha-lock.service failed.
sudo ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -i
----
type=SYSCALL msg=audit(05/28/2017 14:04:32.160:25) : arch=x86_64
syscall=bind success=yes exit=0 a0=0xf a1=0x7ffc757feb60 a2=0x10
a3=0x22 items=0 ppid=1149 pid=1157 auid=unset uid=root gid=root
euid=root suid=root fsuid=root egid=root sgid=root fsgid=root
tty=(none) ses=unset comm=glusterd exe=/usr/sbin/glusterfsd
subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:04:32.160:25) : avc:  denied  {
name_bind } for  pid=1157 comm=glusterd src=61000
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
----
type=SYSCALL msg=audit(05/28/2017 14:11:16.141:26) : arch=x86_64
syscall=bind success=no exit=EACCES(Permission denied) a0=0xf
a1=0x7ffffbf92620 a2=0x10 a3=0x22 items=0 ppid=1139 pid=1146
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
key=(null)
type=AVC msg=audit(05/28/2017 14:11:16.141:26) : avc:  denied  {
name_bind } for  pid=1146 comm=glusterd src=61000
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
----
type=SYSCALL msg=audit(05/28/2017 14:12:48.068:75) : arch=x86_64
syscall=openat success=no exit=EACCES(Permission denied)
a0=0xffffffffffffff9c a1=0x7efdc1ec3e10
a2=O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC a3=0x0 items=0 ppid=1991
pid=1992 auid=unset uid=root gid=root euid=root suid=root fsuid=root
egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.statd
exe=/usr/sbin/rpc.statd subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:12:48.068:75) : avc:  denied  { read
} for  pid=1992 comm=rpc.statd name=sm dev="fuse"
ino=12866274077597183313 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=dir
----
type=SYSCALL msg=audit(05/28/2017 14:12:48.080:76) : arch=x86_64
syscall=open success=no exit=EACCES(Permission denied)
a0=0x7efdc1ec3dd0 a1=O_RDONLY a2=0x7efdc1ec3de8 a3=0x5 items=0
ppid=1991 pid=1992 auid=unset uid=root gid=root euid=root suid=root
fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset
comm=rpc.statd exe=/usr/sbin/rpc.statd
subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:12:48.080:76) : avc:  denied  { read
} for  pid=1992 comm=rpc.statd name=state dev="fuse"
ino=12362789396445498341 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=file
----
type=SYSCALL msg=audit(05/28/2017 14:17:37.177:26) : arch=x86_64
syscall=bind success=no exit=EACCES(Permission denied) a0=0xf
a1=0x7ffdfa768c70 a2=0x10 a3=0x22 items=0 ppid=1155 pid=1162
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
key=(null)
type=AVC msg=audit(05/28/2017 14:17:37.177:26) : avc:  denied  {
name_bind } for  pid=1162 comm=glusterd src=61000
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
----
type=SYSCALL msg=audit(05/28/2017 14:17:46.401:56) : arch=x86_64
syscall=kill success=no exit=EACCES(Permission denied) a0=0x560
a1=SIGKILL a2=0x7fd684000078 a3=0x0 items=0 ppid=1 pid=1167 auid=unset
uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root
fsgid=root tty=(none) ses=unset comm=glusterd exe=/usr/sbin/glusterfsd
subj=system_u:system_r:glusterd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:17:46.401:56) : avc:  denied  {
sigkill } for  pid=1167 comm=glusterd
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:system_r:cluster_t:s0 tclass=process
----
type=SYSCALL msg=audit(05/28/2017 14:17:45.400:55) : arch=x86_64
syscall=kill success=no exit=EACCES(Permission denied) a0=0x560
a1=SIGTERM a2=0x7fd684000038 a3=0x99 items=0 ppid=1 pid=1167
auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
key=(null)
type=AVC msg=audit(05/28/2017 14:17:45.400:55) : avc:  denied  {
signal } for  pid=1167 comm=glusterd
scontext=system_u:system_r:glusterd_t:s0
tcontext=system_u:system_r:cluster_t:s0 tclass=process
----
type=SYSCALL msg=audit(05/28/2017 14:18:56.024:67) : arch=x86_64
syscall=openat success=no exit=EACCES(Permission denied)
a0=0xffffffffffffff9c a1=0x7ff662e9be10
a2=O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC a3=0x0 items=0 ppid=1949
pid=1950 auid=unset uid=root gid=root euid=root suid=root fsuid=root
egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.statd
exe=/usr/sbin/rpc.statd subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:18:56.024:67) : avc:  denied  { read
} for  pid=1950 comm=rpc.statd name=sm dev="fuse"
ino=12866274077597183313 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=dir
----
type=SYSCALL msg=audit(05/28/2017 14:18:56.034:68) : arch=x86_64
syscall=open success=no exit=EACCES(Permission denied)
a0=0x7ff662e9bdd0 a1=O_RDONLY a2=0x7ff662e9bde8 a3=0x5 items=0
ppid=1949 pid=1950 auid=unset uid=root gid=root euid=root suid=root
fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset
comm=rpc.statd exe=/usr/sbin/rpc.statd
subj=system_u:system_r:rpcd_t:s0 key=(null)
type=AVC msg=audit(05/28/2017 14:18:56.034:68) : avc:  denied  { read
} for  pid=1950 comm=rpc.statd name=state dev="fuse"
ino=12362789396445498341 scontext=system_u:system_r:rpcd_t:s0
tcontext=system_u:object_r:fusefs_t:s0 tclass=file
On Mon, May 15, 2017 at 11:56 AM, Soumya Koduri <skoduri at redhat.com>
wrote:>
>
> On 05/12/2017 06:27 PM, Adam Ru wrote:
>>
>> Hi Soumya,
>>
>> Thank you very much for last response ? very useful.
>>
>> I apologize for delay, I had to find time for another testing.
>>
>> I updated instructions that I provided in previous e-mail. *** means
>> that the step was added.
>>
>> Instructions:
>>  - Clean installation of CentOS 7.3 with all updates, 3x node,
>> resolvable IPs and VIPs
>>  - Stopped firewalld (just for testing)
>>  - *** SELinux in permissive mode (I had to, will explain bellow)
>>  - Install "centos-release-gluster" to get
"centos-gluster310" repo
>> and install following (nothing else):
>>  --- glusterfs-server
>>  --- glusterfs-ganesha
>>  - Passwordless SSH between all nodes
>> (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes)
>>  - systemctl enable and start glusterd
>>  - gluster peer probe <other nodes>
>>  - gluster volume set all cluster.enable-shared-storage enable
>>  - systemctl enable and start pcsd.service
>>  - systemctl enable pacemaker.service (cannot be started at this
moment)
>>  - Set password for hacluster user on all nodes
>>  - pcs cluster auth <node 1> <node 2> <node 3> -u
hacluster -p blabla
>>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>> sure if needed)
>>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>> insert configuration
>>  - Try list files on other nodes: ls
>> /var/run/gluster/shared_storage/nfs-ganesha/
>>  - gluster nfs-ganesha enable
>>  - *** systemctl enable pacemaker.service (again, since pacemaker was
>> disabled at this point)
>>  - *** Check owner of "state", "statd",
"sm" and "sm.bak" in
>> /var/lib/nfs/ (I had to: chown rpcuser:rpcuser
>> /var/lib/nfs/statd/state)
>>  - Check on other nodes that nfs-ganesha.service is running and
"pcs
>> status" shows started resources
>>  - gluster volume create mynewshare replica 3 transport tcp
>> node1:/<dir> node2:/<dir> node3:/<dir>
>>  - gluster volume start mynewshare
>>  - gluster vol set mynewshare ganesha.enable on
>>
>> At this moment, this is status of important (I think) services:
>>
>> -- corosync.service             disabled
>> -- corosync-notifyd.service     disabled
>> -- glusterd.service             enabled
>> -- glusterfsd.service           disabled
>> -- pacemaker.service            enabled
>> -- pcsd.service                 enabled
>> -- nfs-ganesha.service          disabled
>> -- nfs-ganesha-config.service   static
>> -- nfs-ganesha-lock.service     static
>>
>> -- corosync.service             active (running)
>> -- corosync-notifyd.service     inactive (dead)
>> -- glusterd.service             active (running)
>> -- glusterfsd.service           inactive (dead)
>> -- pacemaker.service            active (running)
>> -- pcsd.service                 active (running)
>> -- nfs-ganesha.service          active (running)
>> -- nfs-ganesha-config.service   inactive (dead)
>> -- nfs-ganesha-lock.service     active (running)
>>
>> May I ask you a few questions please?
>>
>> 1. Could you please confirm that services above has correct
status/state?
>
>
> Looks good to the best of my knowledge.
>
>>
>> 2. When I restart a node then nfs-ganesha is not running. Of course I
>> cannot enable it since it needs to be enabled after shared storage is
>> mounted. What is best practice to start it automatically so I don?t
>> have to worry about restarting node? Should I create a script that
>> will check whether shared storage was mounted and then start
>> nfs-ganesha? How do you do this in production?
>
>
> That's right.. We have plans to address this in near future (probably
by
> having a new .service which mounts shared_storage before starting
> nfs-ganesha). But until then ..yes having a custom defined script to do so
> is the only way to automate it.
>
>
>>
>> 3. SELinux is an issue, is that a known bug?
>>
>> When I restart a node and start nfs-ganesha.service with SELinux in
>> permissive mode:
>>
>> sudo grep 'statd' /var/log/messages
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read
>> /var/lib/nfs/statd/state: Success
>> May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state
>> May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request
>> from mynode1.localdomain while not monitoring any hosts
>>
>> systemctl status nfs-ganesha-lock.service --full
>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>> static; vendor preset: disabled)
>>    Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s
>> ago
>>   Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>> (code=exited, status=0/SUCCESS)
>>  Main PID: 2415 (rpc.statd)
>>    CGroup: /system.slice/nfs-ganesha-lock.service
>>            ??2415 /usr/sbin/rpc.statd --no-notify
>>
>> May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status
>> monitor for NFSv2/3 locking....
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0
>> starting
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read
>> /var/lib/nfs/statd/state: Success
>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM
>> state
>> May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status
>> monitor for NFSv2/3 locking..
>> May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received
>> SM_UNMON_ALL request from mynode1.localdomain while not monitoring any
>> hosts
>>
>>
>> When I restart a node and start nfs-ganesha.service with SELinux in
>> enforcing mode:
>>
>>
>> sudo grep 'statd' /var/log/messages
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm:
>> Permission denied
>> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open
>> /var/lib/nfs/statd/state: Permission denied
>>
>> systemctl status nfs-ganesha-lock.service --full
>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>> static; vendor preset: disabled)
>>    Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01
>> UTC; 1min 21s ago
>>   Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>> (code=exited, status=1/FAILURE)
>>
>> May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status
>> monitor for NFSv2/3 locking....
>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0
>> starting
>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC
>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open
>> directory sm: Permission denied
>> May 12 12:14:01 mynode1.localdomain systemd[1]:
>> nfs-ganesha-lock.service: control process exited, code=exited status=1
>> May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS
>> status monitor for NFSv2/3 locking..
>> May 12 12:14:01 mynode1.localdomain systemd[1]: Unit
>> nfs-ganesha-lock.service entered failed state.
>> May 12 12:14:01 mynode1.localdomain systemd[1]:
nfs-ganesha-lock.service
>> failed.
>
>
> Cant remember right now. Could you please paste the AVCs you get, and
> se-linux packages version. Or preferably please file a bug. We can get the
> details verified from selinux members.
>
> Thanks,
> Soumya
>
>
>>
>> On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at
redhat.com> wrote:
>>>
>>>
>>>
>>> On 05/05/2017 08:04 PM, Adam Ru wrote:
>>>>
>>>>
>>>> Hi Soumya,
>>>>
>>>> Thank you for the answer.
>>>>
>>>> Enabling Pacemaker? Yes, you?re completely right, I didn?t do
it. Thank
>>>> you.
>>>>
>>>> I spent some time by testing and I have some results. This is
what I
>>>> did:
>>>>
>>>>  - Clean installation of CentOS 7.3 with all updates, 3x node,
>>>> resolvable IPs and VIPs
>>>>  - Stopped firewalld (just for testing)
>>>>  - Install "centos-release-gluster" to get
"centos-gluster310" repo and
>>>> install following (nothing else):
>>>>  --- glusterfs-server
>>>>  --- glusterfs-ganesha
>>>>  - Passwordless SSH between all nodes
(/var/lib/glusterd/nfs/secret.pem
>>>> and secret.pem.pub on all nodes)
>>>>  - systemctl enable and start glusterd
>>>>  - gluster peer probe <other nodes>
>>>>  - gluster volume set all cluster.enable-shared-storage enable
>>>>  - systemctl enable and start pcsd.service
>>>>  - systemctl enable pacemaker.service (cannot be started at
this moment)
>>>>  - Set password for hacluster user on all nodes
>>>>  - pcs cluster auth <node 1> <node 2> <node
3> -u hacluster -p blabla
>>>>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>>>  - touch
/var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>>>> sure if needed)
>>>>  - vi
/var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>>>> insert configuration
>>>>  - Try list files on other nodes: ls
>>>> /var/run/gluster/shared_storage/nfs-ganesha/
>>>>  - gluster nfs-ganesha enable
>>>>  - Check on other nodes that nfs-ganesha.service is running and
"pcs
>>>> status" shows started resources
>>>>  - gluster volume create mynewshare replica 3 transport tcp
node1:/<dir>
>>>> node2:/<dir> node3:/<dir>
>>>>  - gluster volume start mynewshare
>>>>  - gluster vol set mynewshare ganesha.enable on
>>>>
>>>> After these steps, all VIPs are pingable and I can mount
>>>> node1:/mynewshare
>>>>
>>>> Funny thing is that pacemaker.service is disabled again
(something
>>>> disabled it). This is status of important (I think) services:
>>>
>>>
>>>
>>> yeah. We too had observed this recently. We guess probably pcs
cluster
>>> setup
>>> command first destroys existing cluster (if any) which may be
disabling
>>> pacemaker too.
>>>
>>>>
>>>> systemctl list-units --all
>>>> # corosync.service             loaded    active   running
>>>> # glusterd.service             loaded    active   running
>>>> # nfs-config.service           loaded    inactive dead
>>>> # nfs-ganesha-config.service   loaded    inactive dead
>>>> # nfs-ganesha-lock.service     loaded    active   running
>>>> # nfs-ganesha.service          loaded    active   running
>>>> # nfs-idmapd.service           loaded    inactive dead
>>>> # nfs-mountd.service           loaded    inactive dead
>>>> # nfs-server.service           loaded    inactive dead
>>>> # nfs-utils.service            loaded    inactive dead
>>>> # pacemaker.service            loaded    active   running
>>>> # pcsd.service                 loaded    active   running
>>>>
>>>> systemctl list-unit-files --all
>>>> # corosync-notifyd.service    disabled
>>>> # corosync.service            disabled
>>>> # glusterd.service            enabled
>>>> # glusterfsd.service          disabled
>>>> # nfs-blkmap.service          disabled
>>>> # nfs-config.service          static
>>>> # nfs-ganesha-config.service  static
>>>> # nfs-ganesha-lock.service    static
>>>> # nfs-ganesha.service         disabled
>>>> # nfs-idmap.service           static
>>>> # nfs-idmapd.service          static
>>>> # nfs-lock.service            static
>>>> # nfs-mountd.service          static
>>>> # nfs-rquotad.service         disabled
>>>> # nfs-secure-server.service   static
>>>> # nfs-secure.service          static
>>>> # nfs-server.service          disabled
>>>> # nfs-utils.service           static
>>>> # nfs.service                 disabled
>>>> # nfslock.service             static
>>>> # pacemaker.service           disabled
>>>> # pcsd.service                enabled
>>>>
>>>> I enabled pacemaker again on all nodes and restart all nodes
one by one.
>>>>
>>>> After reboot all VIPs are gone and I can see that
nfs-ganesha.service
>>>> isn?t running. When I start it on at least two nodes then VIPs
are
>>>> pingable again and I can mount NFS again. But there is still
some issue
>>>> in the setup because when I check nfs-ganesha-lock.service I
get:
>>>>
>>>> systemctl -l status nfs-ganesha-lock.service
>>>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3
locking.
>>>>    Loaded: loaded
(/usr/lib/systemd/system/nfs-ganesha-lock.service;
>>>> static; vendor preset: disabled)
>>>>    Active: failed (Result: exit-code) since Fri 2017-05-05
13:43:37 UTC;
>>>> 31min ago
>>>>   Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify
$STATDARGS
>>>> (code=exited, status=1/FAILURE)
>>>>
>>>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS
status
>>>> monitor for NFSv2/3 locking....
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version
1.3.0
>>>> starting
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags:
TI-RPC
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to
open
>>>> directory sm: Permission denied
>>>
>>>
>>>
>>> Okay this issue was fixed and the fix should be present in 3.10 too
-
>>>    https://review.gluster.org/#/c/16433/
>>>
>>> Please check '/var/log/messages' for statd related errors
and cross-check
>>> permissions of that directory. You could manually chown owner:group
of
>>> /var/lib/nfs/statd/sm directory for now and then restart
nfs-ganesha*
>>> services.
>>>
>>> Thanks,
>>> Soumya
>>>
>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to
open
>>>> /var/lib/nfs/statd/state: Permission denied
>>>> May 05 13:43:37 node0.localdomain systemd[1]:
nfs-ganesha-lock.service:
>>>> control process exited, code=exited status=1
>>>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start
NFS status
>>>> monitor for NFSv2/3 locking..
>>>> May 05 13:43:37 node0.localdomain systemd[1]: Unit
>>>> nfs-ganesha-lock.service entered failed state.
>>>> May 05 13:43:37 node0.localdomain systemd[1]:
nfs-ganesha-lock.service
>>>> failed.
>>>>
>>>> Thank you,
>>>>
>>>> Kind regards,
>>>>
>>>> Adam
>>>>
>>>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at
outlook.com
>>>> <mailto:mahdi.adnan at outlook.com>> wrote:
>>>>
>>>>     Hi,
>>>>
>>>>
>>>>     Same here, when i reboot the node i have to manually
execute "pcs
>>>>     cluster start gluster01" and pcsd already enabled and
started.
>>>>
>>>>     Gluster 3.8.11
>>>>
>>>>     Centos 7.3 latest
>>>>
>>>>     Installed using CentOS Storage SIG repository
>>>>
>>>>
>>>>
>>>>     --
>>>>
>>>>     Respectfully*
>>>>     **Mahdi A. Mahdi*
>>>>
>>>>
>>>>
------------------------------------------------------------------------
>>>>     *From:* gluster-users-bounces at gluster.org
>>>>     <mailto:gluster-users-bounces at gluster.org>
>>>>     <gluster-users-bounces at gluster.org
>>>>     <mailto:gluster-users-bounces at gluster.org>> on
behalf of Adam Ru
>>>>     <ad.ruckel at gmail.com <mailto:ad.ruckel at
gmail.com>>
>>>>     *Sent:* Wednesday, May 3, 2017 12:09:58 PM
>>>>     *To:* Soumya Koduri
>>>>     *Cc:* gluster-users at gluster.org <mailto:gluster-users
at gluster.org>
>>>>     *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha -
cluster is
>>>>
>>>>     down after reboot
>>>>
>>>>     Hi Soumya,
>>>>
>>>>     thank you very much for your reply.
>>>>
>>>>     I enabled pcsd during setup and after reboot during
troubleshooting
>>>>     I manually started it and checked resources (pcs status).
They were
>>>>     not running. I didn?t find what was wrong but I?m going to
try it
>>>> again.
>>>>
>>>>     I?ve thoroughly checked
>>>>
>>>>
>>>>
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>>
>>>>
>>>>
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>>     and I can confirm that I followed all steps with one
exception. I
>>>>     installed following RPMs:
>>>>     glusterfs-server
>>>>     glusterfs-fuse
>>>>     glusterfs-cli
>>>>     glusterfs-ganesha
>>>>     nfs-ganesha-xfs
>>>>
>>>>     and the guide referenced above specifies:
>>>>     glusterfs-server
>>>>     glusterfs-api
>>>>     glusterfs-ganesha
>>>>
>>>>     glusterfs-api is a dependency of one of RPMs that I
installed so
>>>>     this is not a problem. But I cannot find any mention to
install
>>>>     nfs-ganesha-xfs.
>>>>
>>>>     I?ll try to setup the whole environment again without
installing
>>>>     nfs-ganesha-xfs (I assume glusterfs-ganesha has all
required
>>>> binaries).
>>>>
>>>>     Again, thank you for you time to answer my previous
message.
>>>>
>>>>     Kind regards,
>>>>     Adam
>>>>
>>>>     On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri
at redhat.com
>>>>     <mailto:skoduri at redhat.com>> wrote:
>>>>
>>>>         Hi,
>>>>
>>>>         On 05/02/2017 01:34 AM, Rudolf wrote:
>>>>
>>>>             Hi Gluster users,
>>>>
>>>>             First, I'd like to thank you all for this
amazing
>>>>             open-source! Thank you!
>>>>
>>>>             I'm working on home project ? three servers
with Gluster and
>>>>             NFS-Ganesha. My goal is to create HA NFS share with
three
>>>>             copies of each
>>>>             file on each server.
>>>>
>>>>             My systems are CentOS 7.3 Minimal install with the
latest
>>>>             updates and
>>>>             the most current RPMs from
"centos-gluster310" repository.
>>>>
>>>>             I followed this tutorial:
>>>>
>>>>
>>>>
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>>>>
>>>>
>>>>
<http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
>>>>             (second half that describes multi-node HA setup)
>>>>
>>>>             with a few exceptions:
>>>>
>>>>             1. All RPMs are from "centos-gluster310"
repo that is
>>>>             installed by "yum
>>>>             -y install centos-release-gluster"
>>>>             2. I have three nodes (not four) with "replica
3" volume.
>>>>             3. I created empty ganesha.conf and not empty
>>>> ganesha-ha.conf
>>>> in
>>>>            
"/var/run/gluster/shared_storage/nfs-ganesha/" (referenced
>>>>             blog post is
>>>>             outdated, this is now requirement)
>>>>             4. ganesha-ha.conf doesn't have
"HA_VOL_SERVER" since this
>>>>             isn't needed
>>>>             anymore.
>>>>
>>>>
>>>>         Please refer to
>>>>
>>>>
>>>>
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>>
>>>>
>>>>
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>>
>>>>         It is being updated with latest changes happened wrt
setup.
>>>>
>>>>             When I finish configuration, all is good.
>>>>             nfs-ganesha.service is active
>>>>             and running and from client I can ping all three
VIPs and I
>>>>             can mount
>>>>             NFS. Copied files are replicated to all nodes.
>>>>
>>>>             But when I restart nodes (one by one, with 5 min.
delay
>>>>             between) then I
>>>>             cannot ping or mount (I assume that all VIPs are
down). So
>>>>             my setup
>>>>             definitely isn't HA.
>>>>
>>>>             I found that:
>>>>             # pcs status
>>>>             Error: cluster is not currently running on this
node
>>>>
>>>>
>>>>         This means pcsd service is not up. Did you enable
(systemctl
>>>>         enable pcsd) pcsd service so that is comes up post
reboot
>>>>         automatically. If not please start it manually.
>>>>
>>>>
>>>>             and nfs-ganesha.service is in inactive state. Btw.
I didn't
>>>>             enable
>>>>             "systemctl enable nfs-ganesha" since I
assume that this is
>>>>             something
>>>>             that Gluster does.
>>>>
>>>>
>>>>         Please check /var/log/ganesha.log for any
errors/warnings.
>>>>
>>>>         We recommend not to enable nfs-ganesha.service (by
default), as
>>>>         the shared storage (where the ganesha.conf file resides
now)
>>>>         should be up and running before nfs-ganesha gets
started.
>>>>         So if enabled by default it could happen that
shared_storage
>>>>         mount point is not yet up and it resulted in
nfs-ganesha service
>>>>         failure. If you would like to address this, you could
have a
>>>>         cron job which keeps checking the mount point health
and then
>>>>         start nfs-ganesha service.
>>>>
>>>>         Thanks,
>>>>         Soumya
>>>>
>>>>
>>>>             I assume that my issue is that I followed
instructions in
>>>>             blog post from
>>>>             2015/10 that are outdated. Unfortunately I cannot
find
>>>>             anything better ?
>>>>             I spent whole day by googling.
>>>>
>>>>             Would you be so kind and check the instructions in
blog post
>>>>             and let me
>>>>             know what steps are wrong / outdated? Or please do
you have
>>>>             more current
>>>>             instructions for Gluster+Ganesha setup?
>>>>
>>>>             Thank you.
>>>>
>>>>             Kind regards,
>>>>             Adam
>>>>
>>>>
>>>>
>>>>             _______________________________________________
>>>>             Gluster-users mailing list
>>>>             Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>            
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>            
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>
>>>>
>>>>
>>>>
>>>>     --
>>>>     Adam
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Adam
>>
>>
>>
>>
>
-- 
Adam