thr3ads.net - Gluster users - [Gluster-users] Gluster and NFS-Ganesha

If this information is useful, please help other people find it:
Share via:

Adam Ru

2017-May-12 12:57 UTC

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

Hi Soumya,

Thank you very much for last response ? very useful.

I apologize for delay, I had to find time for another testing.

I updated instructions that I provided in previous e-mail. *** means
that the step was added.

Instructions:
 - Clean installation of CentOS 7.3 with all updates, 3x node,
resolvable IPs and VIPs
 - Stopped firewalld (just for testing)
 - *** SELinux in permissive mode (I had to, will explain bellow)
 - Install "centos-release-gluster" to get
"centos-gluster310" repo
and install following (nothing else):
 --- glusterfs-server
 --- glusterfs-ganesha
 - Passwordless SSH between all nodes
(/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes)
 - systemctl enable and start glusterd
 - gluster peer probe <other nodes>
 - gluster volume set all cluster.enable-shared-storage enable
 - systemctl enable and start pcsd.service
 - systemctl enable pacemaker.service (cannot be started at this moment)
 - Set password for hacluster user on all nodes
 - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p
blabla
 - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
 - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
sure if needed)
 - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
insert configuration
 - Try list files on other nodes: ls
/var/run/gluster/shared_storage/nfs-ganesha/
 - gluster nfs-ganesha enable
 - *** systemctl enable pacemaker.service (again, since pacemaker was
disabled at this point)
 - *** Check owner of "state", "statd", "sm" and
"sm.bak" in
/var/lib/nfs/ (I had to: chown rpcuser:rpcuser
/var/lib/nfs/statd/state)
 - Check on other nodes that nfs-ganesha.service is running and "pcs
status" shows started resources
 - gluster volume create mynewshare replica 3 transport tcp
node1:/<dir> node2:/<dir> node3:/<dir>
 - gluster volume start mynewshare
 - gluster vol set mynewshare ganesha.enable on

At this moment, this is status of important (I think) services:

-- corosync.service             disabled
-- corosync-notifyd.service     disabled
-- glusterd.service             enabled
-- glusterfsd.service           disabled
-- pacemaker.service            enabled
-- pcsd.service                 enabled
-- nfs-ganesha.service          disabled
-- nfs-ganesha-config.service   static
-- nfs-ganesha-lock.service     static

-- corosync.service             active (running)
-- corosync-notifyd.service     inactive (dead)
-- glusterd.service             active (running)
-- glusterfsd.service           inactive (dead)
-- pacemaker.service            active (running)
-- pcsd.service                 active (running)
-- nfs-ganesha.service          active (running)
-- nfs-ganesha-config.service   inactive (dead)
-- nfs-ganesha-lock.service     active (running)

May I ask you a few questions please?

1. Could you please confirm that services above has correct status/state?

2. When I restart a node then nfs-ganesha is not running. Of course I
cannot enable it since it needs to be enabled after shared storage is
mounted. What is best practice to start it automatically so I don?t
have to worry about restarting node? Should I create a script that
will check whether shared storage was mounted and then start
nfs-ganesha? How do you do this in production?

3. SELinux is an issue, is that a known bug?

When I restart a node and start nfs-ganesha.service with SELinux in
permissive mode:

sudo grep 'statd' /var/log/messages
May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting
May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC
May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read
/var/lib/nfs/statd/state: Success
May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state
May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request
from mynode1.localdomain while not monitoring any hosts

systemctl status nfs-ganesha-lock.service --full
? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
static; vendor preset: disabled)
   Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s ago
  Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
(code=exited, status=0/SUCCESS)
 Main PID: 2415 (rpc.statd)
   CGroup: /system.slice/nfs-ganesha-lock.service
           ??2415 /usr/sbin/rpc.statd --no-notify

May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status
monitor for NFSv2/3 locking....
May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0 starting
May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC
May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read
/var/lib/nfs/statd/state: Success
May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM state
May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status
monitor for NFSv2/3 locking..
May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received
SM_UNMON_ALL request from mynode1.localdomain while not monitoring any
hosts


When I restart a node and start nfs-ganesha.service with SELinux in
enforcing mode:


sudo grep 'statd' /var/log/messages
May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting
May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC
May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm:
Permission denied
May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open
/var/lib/nfs/statd/state: Permission denied

systemctl status nfs-ganesha-lock.service --full
? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01
UTC; 1min 21s ago
  Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
(code=exited, status=1/FAILURE)

May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status
monitor for NFSv2/3 locking....
May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0 starting
May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC
May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open
directory sm: Permission denied
May 12 12:14:01 mynode1.localdomain systemd[1]:
nfs-ganesha-lock.service: control process exited, code=exited status=1
May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS
status monitor for NFSv2/3 locking..
May 12 12:14:01 mynode1.localdomain systemd[1]: Unit
nfs-ganesha-lock.service entered failed state.
May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service failed.

On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at redhat.com>
wrote:>
>
> On 05/05/2017 08:04 PM, Adam Ru wrote:
>>
>> Hi Soumya,
>>
>> Thank you for the answer.
>>
>> Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank
>> you.
>>
>> I spent some time by testing and I have some results. This is what I
did:
>>
>>  - Clean installation of CentOS 7.3 with all updates, 3x node,
>> resolvable IPs and VIPs
>>  - Stopped firewalld (just for testing)
>>  - Install "centos-release-gluster" to get
"centos-gluster310" repo and
>> install following (nothing else):
>>  --- glusterfs-server
>>  --- glusterfs-ganesha
>>  - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem
>> and secret.pem.pub on all nodes)
>>  - systemctl enable and start glusterd
>>  - gluster peer probe <other nodes>
>>  - gluster volume set all cluster.enable-shared-storage enable
>>  - systemctl enable and start pcsd.service
>>  - systemctl enable pacemaker.service (cannot be started at this
moment)
>>  - Set password for hacluster user on all nodes
>>  - pcs cluster auth <node 1> <node 2> <node 3> -u
hacluster -p blabla
>>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>> sure if needed)
>>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>> insert configuration
>>  - Try list files on other nodes: ls
>> /var/run/gluster/shared_storage/nfs-ganesha/
>>  - gluster nfs-ganesha enable
>>  - Check on other nodes that nfs-ganesha.service is running and
"pcs
>> status" shows started resources
>>  - gluster volume create mynewshare replica 3 transport tcp
node1:/<dir>
>> node2:/<dir> node3:/<dir>
>>  - gluster volume start mynewshare
>>  - gluster vol set mynewshare ganesha.enable on
>>
>> After these steps, all VIPs are pingable and I can mount
node1:/mynewshare
>>
>> Funny thing is that pacemaker.service is disabled again (something
>> disabled it). This is status of important (I think) services:
>
>
> yeah. We too had observed this recently. We guess probably pcs cluster
setup
> command first destroys existing cluster (if any) which may be disabling
> pacemaker too.
>
>>
>> systemctl list-units --all
>> # corosync.service             loaded    active   running
>> # glusterd.service             loaded    active   running
>> # nfs-config.service           loaded    inactive dead
>> # nfs-ganesha-config.service   loaded    inactive dead
>> # nfs-ganesha-lock.service     loaded    active   running
>> # nfs-ganesha.service          loaded    active   running
>> # nfs-idmapd.service           loaded    inactive dead
>> # nfs-mountd.service           loaded    inactive dead
>> # nfs-server.service           loaded    inactive dead
>> # nfs-utils.service            loaded    inactive dead
>> # pacemaker.service            loaded    active   running
>> # pcsd.service                 loaded    active   running
>>
>> systemctl list-unit-files --all
>> # corosync-notifyd.service    disabled
>> # corosync.service            disabled
>> # glusterd.service            enabled
>> # glusterfsd.service          disabled
>> # nfs-blkmap.service          disabled
>> # nfs-config.service          static
>> # nfs-ganesha-config.service  static
>> # nfs-ganesha-lock.service    static
>> # nfs-ganesha.service         disabled
>> # nfs-idmap.service           static
>> # nfs-idmapd.service          static
>> # nfs-lock.service            static
>> # nfs-mountd.service          static
>> # nfs-rquotad.service         disabled
>> # nfs-secure-server.service   static
>> # nfs-secure.service          static
>> # nfs-server.service          disabled
>> # nfs-utils.service           static
>> # nfs.service                 disabled
>> # nfslock.service             static
>> # pacemaker.service           disabled
>> # pcsd.service                enabled
>>
>> I enabled pacemaker again on all nodes and restart all nodes one by
one.
>>
>> After reboot all VIPs are gone and I can see that nfs-ganesha.service
>> isn?t running. When I start it on at least two nodes then VIPs are
>> pingable again and I can mount NFS again. But there is still some issue
>> in the setup because when I check nfs-ganesha-lock.service I get:
>>
>> systemctl -l status nfs-ganesha-lock.service
>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>> static; vendor preset: disabled)
>>    Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37
UTC;
>> 31min ago
>>   Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>> (code=exited, status=1/FAILURE)
>>
>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status
>> monitor for NFSv2/3 locking....
>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0
starting
>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>> directory sm: Permission denied
>
>
> Okay this issue was fixed and the fix should be present in 3.10 too -
>    https://review.gluster.org/#/c/16433/
>
> Please check '/var/log/messages' for statd related errors and
cross-check
> permissions of that directory. You could manually chown owner:group of
> /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha*
> services.
>
> Thanks,
> Soumya
>
>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>> /var/lib/nfs/statd/state: Permission denied
>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service:
>> control process exited, code=exited status=1
>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS
status
>> monitor for NFSv2/3 locking..
>> May 05 13:43:37 node0.localdomain systemd[1]: Unit
>> nfs-ganesha-lock.service entered failed state.
>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service
>> failed.
>>
>> Thank you,
>>
>> Kind regards,
>>
>> Adam
>>
>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at
outlook.com
>> <mailto:mahdi.adnan at outlook.com>> wrote:
>>
>>     Hi,
>>
>>
>>     Same here, when i reboot the node i have to manually execute
"pcs
>>     cluster start gluster01" and pcsd already enabled and started.
>>
>>     Gluster 3.8.11
>>
>>     Centos 7.3 latest
>>
>>     Installed using CentOS Storage SIG repository
>>
>>
>>
>>     --
>>
>>     Respectfully*
>>     **Mahdi A. Mahdi*
>>
>>
>>
------------------------------------------------------------------------
>>     *From:* gluster-users-bounces at gluster.org
>>     <mailto:gluster-users-bounces at gluster.org>
>>     <gluster-users-bounces at gluster.org
>>     <mailto:gluster-users-bounces at gluster.org>> on behalf
of Adam Ru
>>     <ad.ruckel at gmail.com <mailto:ad.ruckel at
gmail.com>>
>>     *Sent:* Wednesday, May 3, 2017 12:09:58 PM
>>     *To:* Soumya Koduri
>>     *Cc:* gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>>     *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is
>>
>>     down after reboot
>>
>>     Hi Soumya,
>>
>>     thank you very much for your reply.
>>
>>     I enabled pcsd during setup and after reboot during troubleshooting
>>     I manually started it and checked resources (pcs status). They were
>>     not running. I didn?t find what was wrong but I?m going to try it
>> again.
>>
>>     I?ve thoroughly checked
>>
>>
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>
>>
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>     and I can confirm that I followed all steps with one exception. I
>>     installed following RPMs:
>>     glusterfs-server
>>     glusterfs-fuse
>>     glusterfs-cli
>>     glusterfs-ganesha
>>     nfs-ganesha-xfs
>>
>>     and the guide referenced above specifies:
>>     glusterfs-server
>>     glusterfs-api
>>     glusterfs-ganesha
>>
>>     glusterfs-api is a dependency of one of RPMs that I installed so
>>     this is not a problem. But I cannot find any mention to install
>>     nfs-ganesha-xfs.
>>
>>     I?ll try to setup the whole environment again without installing
>>     nfs-ganesha-xfs (I assume glusterfs-ganesha has all required
>> binaries).
>>
>>     Again, thank you for you time to answer my previous message.
>>
>>     Kind regards,
>>     Adam
>>
>>     On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at
redhat.com
>>     <mailto:skoduri at redhat.com>> wrote:
>>
>>         Hi,
>>
>>         On 05/02/2017 01:34 AM, Rudolf wrote:
>>
>>             Hi Gluster users,
>>
>>             First, I'd like to thank you all for this amazing
>>             open-source! Thank you!
>>
>>             I'm working on home project ? three servers with
Gluster and
>>             NFS-Ganesha. My goal is to create HA NFS share with three
>>             copies of each
>>             file on each server.
>>
>>             My systems are CentOS 7.3 Minimal install with the latest
>>             updates and
>>             the most current RPMs from "centos-gluster310"
repository.
>>
>>             I followed this tutorial:
>>
>>
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>>
>>
<http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
>>             (second half that describes multi-node HA setup)
>>
>>             with a few exceptions:
>>
>>             1. All RPMs are from "centos-gluster310" repo
that is
>>             installed by "yum
>>             -y install centos-release-gluster"
>>             2. I have three nodes (not four) with "replica 3"
volume.
>>             3. I created empty ganesha.conf and not empty
ganesha-ha.conf
>> in
>>             "/var/run/gluster/shared_storage/nfs-ganesha/"
(referenced
>>             blog post is
>>             outdated, this is now requirement)
>>             4. ganesha-ha.conf doesn't have
"HA_VOL_SERVER" since this
>>             isn't needed
>>             anymore.
>>
>>
>>         Please refer to
>>
>>
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>
>>
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>
>>         It is being updated with latest changes happened wrt setup.
>>
>>             When I finish configuration, all is good.
>>             nfs-ganesha.service is active
>>             and running and from client I can ping all three VIPs and I
>>             can mount
>>             NFS. Copied files are replicated to all nodes.
>>
>>             But when I restart nodes (one by one, with 5 min. delay
>>             between) then I
>>             cannot ping or mount (I assume that all VIPs are down). So
>>             my setup
>>             definitely isn't HA.
>>
>>             I found that:
>>             # pcs status
>>             Error: cluster is not currently running on this node
>>
>>
>>         This means pcsd service is not up. Did you enable (systemctl
>>         enable pcsd) pcsd service so that is comes up post reboot
>>         automatically. If not please start it manually.
>>
>>
>>             and nfs-ganesha.service is in inactive state. Btw. I
didn't
>>             enable
>>             "systemctl enable nfs-ganesha" since I assume
that this is
>>             something
>>             that Gluster does.
>>
>>
>>         Please check /var/log/ganesha.log for any errors/warnings.
>>
>>         We recommend not to enable nfs-ganesha.service (by default), as
>>         the shared storage (where the ganesha.conf file resides now)
>>         should be up and running before nfs-ganesha gets started.
>>         So if enabled by default it could happen that shared_storage
>>         mount point is not yet up and it resulted in nfs-ganesha
service
>>         failure. If you would like to address this, you could have a
>>         cron job which keeps checking the mount point health and then
>>         start nfs-ganesha service.
>>
>>         Thanks,
>>         Soumya
>>
>>
>>             I assume that my issue is that I followed instructions in
>>             blog post from
>>             2015/10 that are outdated. Unfortunately I cannot find
>>             anything better ?
>>             I spent whole day by googling.
>>
>>             Would you be so kind and check the instructions in blog
post
>>             and let me
>>             know what steps are wrong / outdated? Or please do you have
>>             more current
>>             instructions for Gluster+Ganesha setup?
>>
>>             Thank you.
>>
>>             Kind regards,
>>             Adam
>>
>>
>>
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>            
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>
>>     --
>>     Adam
>>
>>
>>
>>
>> --
>> Adam


-- 
Adam

Soumya Koduri

2017-May-15 10:56 UTC

head link

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

On 05/12/2017 06:27 PM, Adam Ru wrote:> Hi Soumya,
>
> Thank you very much for last response ? very useful.
>
> I apologize for delay, I had to find time for another testing.
>
> I updated instructions that I provided in previous e-mail. *** means
> that the step was added.
>
> Instructions:
>  - Clean installation of CentOS 7.3 with all updates, 3x node,
> resolvable IPs and VIPs
>  - Stopped firewalld (just for testing)
>  - *** SELinux in permissive mode (I had to, will explain bellow)
>  - Install "centos-release-gluster" to get
"centos-gluster310" repo
> and install following (nothing else):
>  --- glusterfs-server
>  --- glusterfs-ganesha
>  - Passwordless SSH between all nodes
> (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes)
>  - systemctl enable and start glusterd
>  - gluster peer probe <other nodes>
>  - gluster volume set all cluster.enable-shared-storage enable
>  - systemctl enable and start pcsd.service
>  - systemctl enable pacemaker.service (cannot be started at this moment)
>  - Set password for hacluster user on all nodes
>  - pcs cluster auth <node 1> <node 2> <node 3> -u
hacluster -p blabla
>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
> sure if needed)
>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
> insert configuration
>  - Try list files on other nodes: ls
> /var/run/gluster/shared_storage/nfs-ganesha/
>  - gluster nfs-ganesha enable
>  - *** systemctl enable pacemaker.service (again, since pacemaker was
> disabled at this point)
>  - *** Check owner of "state", "statd", "sm"
and "sm.bak" in
> /var/lib/nfs/ (I had to: chown rpcuser:rpcuser
> /var/lib/nfs/statd/state)
>  - Check on other nodes that nfs-ganesha.service is running and "pcs
> status" shows started resources
>  - gluster volume create mynewshare replica 3 transport tcp
> node1:/<dir> node2:/<dir> node3:/<dir>
>  - gluster volume start mynewshare
>  - gluster vol set mynewshare ganesha.enable on
>
> At this moment, this is status of important (I think) services:
>
> -- corosync.service             disabled
> -- corosync-notifyd.service     disabled
> -- glusterd.service             enabled
> -- glusterfsd.service           disabled
> -- pacemaker.service            enabled
> -- pcsd.service                 enabled
> -- nfs-ganesha.service          disabled
> -- nfs-ganesha-config.service   static
> -- nfs-ganesha-lock.service     static
>
> -- corosync.service             active (running)
> -- corosync-notifyd.service     inactive (dead)
> -- glusterd.service             active (running)
> -- glusterfsd.service           inactive (dead)
> -- pacemaker.service            active (running)
> -- pcsd.service                 active (running)
> -- nfs-ganesha.service          active (running)
> -- nfs-ganesha-config.service   inactive (dead)
> -- nfs-ganesha-lock.service     active (running)
>
> May I ask you a few questions please?
>
> 1. Could you please confirm that services above has correct status/state?
Looks good to the best of my knowledge.
>
> 2. When I restart a node then nfs-ganesha is not running. Of course I
> cannot enable it since it needs to be enabled after shared storage is
> mounted. What is best practice to start it automatically so I don?t
> have to worry about restarting node? Should I create a script that
> will check whether shared storage was mounted and then start
> nfs-ganesha? How do you do this in production?
That's right.. We have plans to address this in near future (probably by 
having a new .service which mounts shared_storage before starting 
nfs-ganesha). But until then ..yes having a custom defined script to do 
so is the only way to automate it.
>
> 3. SELinux is an issue, is that a known bug?
>
> When I restart a node and start nfs-ganesha.service with SELinux in
> permissive mode:
>
> sudo grep 'statd' /var/log/messages
> May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting
> May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC
> May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read
> /var/lib/nfs/statd/state: Success
> May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state
> May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request
> from mynode1.localdomain while not monitoring any hosts
>
> systemctl status nfs-ganesha-lock.service --full
> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
> static; vendor preset: disabled)
>    Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s ago
>   Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
> (code=exited, status=0/SUCCESS)
>  Main PID: 2415 (rpc.statd)
>    CGroup: /system.slice/nfs-ganesha-lock.service
>            ??2415 /usr/sbin/rpc.statd --no-notify
>
> May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status
> monitor for NFSv2/3 locking....
> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0 starting
> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC
> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read
> /var/lib/nfs/statd/state: Success
> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM state
> May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status
> monitor for NFSv2/3 locking..
> May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received
> SM_UNMON_ALL request from mynode1.localdomain while not monitoring any
> hosts
>
>
> When I restart a node and start nfs-ganesha.service with SELinux in
> enforcing mode:
>
>
> sudo grep 'statd' /var/log/messages
> May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting
> May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC
> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm:
> Permission denied
> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open
> /var/lib/nfs/statd/state: Permission denied
>
> systemctl status nfs-ganesha-lock.service --full
> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
> static; vendor preset: disabled)
>    Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01
> UTC; 1min 21s ago
>   Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
> (code=exited, status=1/FAILURE)
>
> May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status
> monitor for NFSv2/3 locking....
> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0 starting
> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC
> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open
> directory sm: Permission denied
> May 12 12:14:01 mynode1.localdomain systemd[1]:
> nfs-ganesha-lock.service: control process exited, code=exited status=1
> May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS
> status monitor for NFSv2/3 locking..
> May 12 12:14:01 mynode1.localdomain systemd[1]: Unit
> nfs-ganesha-lock.service entered failed state.
> May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service
failed.
Cant remember right now. Could you please paste the AVCs you get, and 
se-linux packages version. Or preferably please file a bug. We can get 
the details verified from selinux members.

Thanks,
Soumya
>
> On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri at redhat.com>
wrote:
>>
>>
>> On 05/05/2017 08:04 PM, Adam Ru wrote:
>>>
>>> Hi Soumya,
>>>
>>> Thank you for the answer.
>>>
>>> Enabling Pacemaker? Yes, you?re completely right, I didn?t do it.
Thank
>>> you.
>>>
>>> I spent some time by testing and I have some results. This is what
I did:
>>>
>>>  - Clean installation of CentOS 7.3 with all updates, 3x node,
>>> resolvable IPs and VIPs
>>>  - Stopped firewalld (just for testing)
>>>  - Install "centos-release-gluster" to get
"centos-gluster310" repo and
>>> install following (nothing else):
>>>  --- glusterfs-server
>>>  --- glusterfs-ganesha
>>>  - Passwordless SSH between all nodes
(/var/lib/glusterd/nfs/secret.pem
>>> and secret.pem.pub on all nodes)
>>>  - systemctl enable and start glusterd
>>>  - gluster peer probe <other nodes>
>>>  - gluster volume set all cluster.enable-shared-storage enable
>>>  - systemctl enable and start pcsd.service
>>>  - systemctl enable pacemaker.service (cannot be started at this
moment)
>>>  - Set password for hacluster user on all nodes
>>>  - pcs cluster auth <node 1> <node 2> <node 3> -u
hacluster -p blabla
>>>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf
(not
>>> sure if needed)
>>>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf
and
>>> insert configuration
>>>  - Try list files on other nodes: ls
>>> /var/run/gluster/shared_storage/nfs-ganesha/
>>>  - gluster nfs-ganesha enable
>>>  - Check on other nodes that nfs-ganesha.service is running and
"pcs
>>> status" shows started resources
>>>  - gluster volume create mynewshare replica 3 transport tcp
node1:/<dir>
>>> node2:/<dir> node3:/<dir>
>>>  - gluster volume start mynewshare
>>>  - gluster vol set mynewshare ganesha.enable on
>>>
>>> After these steps, all VIPs are pingable and I can mount
node1:/mynewshare
>>>
>>> Funny thing is that pacemaker.service is disabled again (something
>>> disabled it). This is status of important (I think) services:
>>
>>
>> yeah. We too had observed this recently. We guess probably pcs cluster
setup
>> command first destroys existing cluster (if any) which may be disabling
>> pacemaker too.
>>
>>>
>>> systemctl list-units --all
>>> # corosync.service             loaded    active   running
>>> # glusterd.service             loaded    active   running
>>> # nfs-config.service           loaded    inactive dead
>>> # nfs-ganesha-config.service   loaded    inactive dead
>>> # nfs-ganesha-lock.service     loaded    active   running
>>> # nfs-ganesha.service          loaded    active   running
>>> # nfs-idmapd.service           loaded    inactive dead
>>> # nfs-mountd.service           loaded    inactive dead
>>> # nfs-server.service           loaded    inactive dead
>>> # nfs-utils.service            loaded    inactive dead
>>> # pacemaker.service            loaded    active   running
>>> # pcsd.service                 loaded    active   running
>>>
>>> systemctl list-unit-files --all
>>> # corosync-notifyd.service    disabled
>>> # corosync.service            disabled
>>> # glusterd.service            enabled
>>> # glusterfsd.service          disabled
>>> # nfs-blkmap.service          disabled
>>> # nfs-config.service          static
>>> # nfs-ganesha-config.service  static
>>> # nfs-ganesha-lock.service    static
>>> # nfs-ganesha.service         disabled
>>> # nfs-idmap.service           static
>>> # nfs-idmapd.service          static
>>> # nfs-lock.service            static
>>> # nfs-mountd.service          static
>>> # nfs-rquotad.service         disabled
>>> # nfs-secure-server.service   static
>>> # nfs-secure.service          static
>>> # nfs-server.service          disabled
>>> # nfs-utils.service           static
>>> # nfs.service                 disabled
>>> # nfslock.service             static
>>> # pacemaker.service           disabled
>>> # pcsd.service                enabled
>>>
>>> I enabled pacemaker again on all nodes and restart all nodes one by
one.
>>>
>>> After reboot all VIPs are gone and I can see that
nfs-ganesha.service
>>> isn?t running. When I start it on at least two nodes then VIPs are
>>> pingable again and I can mount NFS again. But there is still some
issue
>>> in the setup because when I check nfs-ganesha-lock.service I get:
>>>
>>> systemctl -l status nfs-ganesha-lock.service
>>> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3
locking.
>>>    Loaded: loaded
(/usr/lib/systemd/system/nfs-ganesha-lock.service;
>>> static; vendor preset: disabled)
>>>    Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37
UTC;
>>> 31min ago
>>>   Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify
$STATDARGS
>>> (code=exited, status=1/FAILURE)
>>>
>>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status
>>> monitor for NFSv2/3 locking....
>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0
starting
>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>>> directory sm: Permission denied
>>
>>
>> Okay this issue was fixed and the fix should be present in 3.10 too -
>>    https://review.gluster.org/#/c/16433/
>>
>> Please check '/var/log/messages' for statd related errors and
cross-check
>> permissions of that directory. You could manually chown owner:group of
>> /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha*
>> services.
>>
>> Thanks,
>> Soumya
>>
>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>>> /var/lib/nfs/statd/state: Permission denied
>>> May 05 13:43:37 node0.localdomain systemd[1]:
nfs-ganesha-lock.service:
>>> control process exited, code=exited status=1
>>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS
status
>>> monitor for NFSv2/3 locking..
>>> May 05 13:43:37 node0.localdomain systemd[1]: Unit
>>> nfs-ganesha-lock.service entered failed state.
>>> May 05 13:43:37 node0.localdomain systemd[1]:
nfs-ganesha-lock.service
>>> failed.
>>>
>>> Thank you,
>>>
>>> Kind regards,
>>>
>>> Adam
>>>
>>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at
outlook.com
>>> <mailto:mahdi.adnan at outlook.com>> wrote:
>>>
>>>     Hi,
>>>
>>>
>>>     Same here, when i reboot the node i have to manually execute
"pcs
>>>     cluster start gluster01" and pcsd already enabled and
started.
>>>
>>>     Gluster 3.8.11
>>>
>>>     Centos 7.3 latest
>>>
>>>     Installed using CentOS Storage SIG repository
>>>
>>>
>>>
>>>     --
>>>
>>>     Respectfully*
>>>     **Mahdi A. Mahdi*
>>>
>>>
>>>
------------------------------------------------------------------------
>>>     *From:* gluster-users-bounces at gluster.org
>>>     <mailto:gluster-users-bounces at gluster.org>
>>>     <gluster-users-bounces at gluster.org
>>>     <mailto:gluster-users-bounces at gluster.org>> on
behalf of Adam Ru
>>>     <ad.ruckel at gmail.com <mailto:ad.ruckel at
gmail.com>>
>>>     *Sent:* Wednesday, May 3, 2017 12:09:58 PM
>>>     *To:* Soumya Koduri
>>>     *Cc:* gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>>>     *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha -
cluster is
>>>
>>>     down after reboot
>>>
>>>     Hi Soumya,
>>>
>>>     thank you very much for your reply.
>>>
>>>     I enabled pcsd during setup and after reboot during
troubleshooting
>>>     I manually started it and checked resources (pcs status). They
were
>>>     not running. I didn?t find what was wrong but I?m going to try
it
>>> again.
>>>
>>>     I?ve thoroughly checked
>>>
>>>
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>
>>>
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>     and I can confirm that I followed all steps with one exception.
I
>>>     installed following RPMs:
>>>     glusterfs-server
>>>     glusterfs-fuse
>>>     glusterfs-cli
>>>     glusterfs-ganesha
>>>     nfs-ganesha-xfs
>>>
>>>     and the guide referenced above specifies:
>>>     glusterfs-server
>>>     glusterfs-api
>>>     glusterfs-ganesha
>>>
>>>     glusterfs-api is a dependency of one of RPMs that I installed
so
>>>     this is not a problem. But I cannot find any mention to install
>>>     nfs-ganesha-xfs.
>>>
>>>     I?ll try to setup the whole environment again without
installing
>>>     nfs-ganesha-xfs (I assume glusterfs-ganesha has all required
>>> binaries).
>>>
>>>     Again, thank you for you time to answer my previous message.
>>>
>>>     Kind regards,
>>>     Adam
>>>
>>>     On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at
redhat.com
>>>     <mailto:skoduri at redhat.com>> wrote:
>>>
>>>         Hi,
>>>
>>>         On 05/02/2017 01:34 AM, Rudolf wrote:
>>>
>>>             Hi Gluster users,
>>>
>>>             First, I'd like to thank you all for this amazing
>>>             open-source! Thank you!
>>>
>>>             I'm working on home project ? three servers with
Gluster and
>>>             NFS-Ganesha. My goal is to create HA NFS share with
three
>>>             copies of each
>>>             file on each server.
>>>
>>>             My systems are CentOS 7.3 Minimal install with the
latest
>>>             updates and
>>>             the most current RPMs from
"centos-gluster310" repository.
>>>
>>>             I followed this tutorial:
>>>
>>>
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>>>
>>>
<http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
>>>             (second half that describes multi-node HA setup)
>>>
>>>             with a few exceptions:
>>>
>>>             1. All RPMs are from "centos-gluster310" repo
that is
>>>             installed by "yum
>>>             -y install centos-release-gluster"
>>>             2. I have three nodes (not four) with "replica
3" volume.
>>>             3. I created empty ganesha.conf and not empty
ganesha-ha.conf
>>> in
>>>            
"/var/run/gluster/shared_storage/nfs-ganesha/" (referenced
>>>             blog post is
>>>             outdated, this is now requirement)
>>>             4. ganesha-ha.conf doesn't have
"HA_VOL_SERVER" since this
>>>             isn't needed
>>>             anymore.
>>>
>>>
>>>         Please refer to
>>>
>>>
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>
>>>
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>
>>>         It is being updated with latest changes happened wrt setup.
>>>
>>>             When I finish configuration, all is good.
>>>             nfs-ganesha.service is active
>>>             and running and from client I can ping all three VIPs
and I
>>>             can mount
>>>             NFS. Copied files are replicated to all nodes.
>>>
>>>             But when I restart nodes (one by one, with 5 min. delay
>>>             between) then I
>>>             cannot ping or mount (I assume that all VIPs are down).
So
>>>             my setup
>>>             definitely isn't HA.
>>>
>>>             I found that:
>>>             # pcs status
>>>             Error: cluster is not currently running on this node
>>>
>>>
>>>         This means pcsd service is not up. Did you enable
(systemctl
>>>         enable pcsd) pcsd service so that is comes up post reboot
>>>         automatically. If not please start it manually.
>>>
>>>
>>>             and nfs-ganesha.service is in inactive state. Btw. I
didn't
>>>             enable
>>>             "systemctl enable nfs-ganesha" since I assume
that this is
>>>             something
>>>             that Gluster does.
>>>
>>>
>>>         Please check /var/log/ganesha.log for any errors/warnings.
>>>
>>>         We recommend not to enable nfs-ganesha.service (by
default), as
>>>         the shared storage (where the ganesha.conf file resides
now)
>>>         should be up and running before nfs-ganesha gets started.
>>>         So if enabled by default it could happen that
shared_storage
>>>         mount point is not yet up and it resulted in nfs-ganesha
service
>>>         failure. If you would like to address this, you could have
a
>>>         cron job which keeps checking the mount point health and
then
>>>         start nfs-ganesha service.
>>>
>>>         Thanks,
>>>         Soumya
>>>
>>>
>>>             I assume that my issue is that I followed instructions
in
>>>             blog post from
>>>             2015/10 that are outdated. Unfortunately I cannot find
>>>             anything better ?
>>>             I spent whole day by googling.
>>>
>>>             Would you be so kind and check the instructions in blog
post
>>>             and let me
>>>             know what steps are wrong / outdated? Or please do you
have
>>>             more current
>>>             instructions for Gluster+Ganesha setup?
>>>
>>>             Thank you.
>>>
>>>             Kind regards,
>>>             Adam
>>>
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>>            
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>>
>>>
>>>
>>>     --
>>>     Adam
>>>
>>>
>>>
>>>
>>> --
>>> Adam
>
>
>

Gluster users - May 2017 - Gluster and NFS-Ganesha - cluster is down after reboot

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot