thr3ads.net - Gluster users - [Gluster-users] Gluster and NFS-Ganesha

If this information is useful, please help other people find it:
Share via:

Adam Ru

2017-May-05 14:34 UTC

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

Hi Soumya,

Thank you for the answer.

Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank you.

I spent some time by testing and I have some results. This is what I did:

 - Clean installation of CentOS 7.3 with all updates, 3x node, resolvable
IPs and VIPs
 - Stopped firewalld (just for testing)
 - Install "centos-release-gluster" to get
"centos-gluster310" repo and
install following (nothing else):
 --- glusterfs-server
 --- glusterfs-ganesha
 - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem and
secret.pem.pub on all nodes)
 - systemctl enable and start glusterd
 - gluster peer probe <other nodes>
 - gluster volume set all cluster.enable-shared-storage enable
 - systemctl enable and start pcsd.service
 - systemctl enable pacemaker.service (cannot be started at this moment)
 - Set password for hacluster user on all nodes
 - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p
blabla
 - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
 - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not sure
if needed)
 - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
insert configuration
 - Try list files on other nodes: ls
/var/run/gluster/shared_storage/nfs-ganesha/
 - gluster nfs-ganesha enable
 - Check on other nodes that nfs-ganesha.service is running and "pcs
status" shows started resources
 - gluster volume create mynewshare replica 3 transport tcp node1:/<dir>
node2:/<dir> node3:/<dir>
 - gluster volume start mynewshare
 - gluster vol set mynewshare ganesha.enable on

After these steps, all VIPs are pingable and I can mount node1:/mynewshare

Funny thing is that pacemaker.service is disabled again (something disabled
it). This is status of important (I think) services:

systemctl list-units --all
# corosync.service             loaded    active   running
# glusterd.service             loaded    active   running
# nfs-config.service           loaded    inactive dead
# nfs-ganesha-config.service   loaded    inactive dead
# nfs-ganesha-lock.service     loaded    active   running
# nfs-ganesha.service          loaded    active   running
# nfs-idmapd.service           loaded    inactive dead
# nfs-mountd.service           loaded    inactive dead
# nfs-server.service           loaded    inactive dead
# nfs-utils.service            loaded    inactive dead
# pacemaker.service            loaded    active   running
# pcsd.service                 loaded    active   running

systemctl list-unit-files --all
# corosync-notifyd.service    disabled
# corosync.service            disabled
# glusterd.service            enabled
# glusterfsd.service          disabled
# nfs-blkmap.service          disabled
# nfs-config.service          static
# nfs-ganesha-config.service  static
# nfs-ganesha-lock.service    static
# nfs-ganesha.service         disabled
# nfs-idmap.service           static
# nfs-idmapd.service          static
# nfs-lock.service            static
# nfs-mountd.service          static
# nfs-rquotad.service         disabled
# nfs-secure-server.service   static
# nfs-secure.service          static
# nfs-server.service          disabled
# nfs-utils.service           static
# nfs.service                 disabled
# nfslock.service             static
# pacemaker.service           disabled
# pcsd.service                enabled

I enabled pacemaker again on all nodes and restart all nodes one by one.

After reboot all VIPs are gone and I can see that nfs-ganesha.service isn?t
running. When I start it on at least two nodes then VIPs are pingable again
and I can mount NFS again. But there is still some issue in the setup
because when I check nfs-ganesha-lock.service I get:

systemctl -l status nfs-ganesha-lock.service
? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC;
31min ago
  Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
(code=exited, status=1/FAILURE)

May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status monitor
for NFSv2/3 locking....
May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting
May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open directory
sm: Permission denied
May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
/var/lib/nfs/statd/state: Permission denied
May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service:
control process exited, code=exited status=1
May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status
monitor for NFSv2/3 locking..
May 05 13:43:37 node0.localdomain systemd[1]: Unit nfs-ganesha-lock.service
entered failed state.
May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service
failed.

Thank you,

Kind regards,

Adam

On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:
> Hi,
>
>
> Same here, when i reboot the node i have to manually execute "pcs
cluster
> start gluster01" and pcsd already enabled and started.
>
> Gluster 3.8.11
>
> Centos 7.3 latest
>
> Installed using CentOS Storage SIG repository
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* gluster-users-bounces at gluster.org <gluster-users-bounces@
> gluster.org> on behalf of Adam Ru <ad.ruckel at gmail.com>
> *Sent:* Wednesday, May 3, 2017 12:09:58 PM
> *To:* Soumya Koduri
> *Cc:* gluster-users at gluster.org
> *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is down
> after reboot
>
> Hi Soumya,
>
> thank you very much for your reply.
>
> I enabled pcsd during setup and after reboot during troubleshooting I
> manually started it and checked resources (pcs status). They were not
> running. I didn?t find what was wrong but I?m going to try it again.
>
> I?ve thoroughly checked
> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%
> 20GlusterFS%20Integration/
> and I can confirm that I followed all steps with one exception. I
> installed following RPMs:
> glusterfs-server
> glusterfs-fuse
> glusterfs-cli
> glusterfs-ganesha
> nfs-ganesha-xfs
>
> and the guide referenced above specifies:
> glusterfs-server
> glusterfs-api
> glusterfs-ganesha
>
> glusterfs-api is a dependency of one of RPMs that I installed so this is
> not a problem. But I cannot find any mention to install nfs-ganesha-xfs.
>
> I?ll try to setup the whole environment again without installing
> nfs-ganesha-xfs (I assume glusterfs-ganesha has all required binaries).
>
> Again, thank you for you time to answer my previous message.
>
> Kind regards,
> Adam
>
> On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com>
wrote:
>
>> Hi,
>>
>> On 05/02/2017 01:34 AM, Rudolf wrote:
>>
>>> Hi Gluster users,
>>>
>>> First, I'd like to thank you all for this amazing open-source!
Thank you!
>>>
>>> I'm working on home project ? three servers with Gluster and
>>> NFS-Ganesha. My goal is to create HA NFS share with three copies of
each
>>> file on each server.
>>>
>>> My systems are CentOS 7.3 Minimal install with the latest updates
and
>>> the most current RPMs from "centos-gluster310"
repository.
>>>
>>> I followed this tutorial:
>>> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-
>>> nfs-ganesha-and-glusterfs-one-step-at-a-time/
>>> (second half that describes multi-node HA setup)
>>>
>>> with a few exceptions:
>>>
>>> 1. All RPMs are from "centos-gluster310" repo that is
installed by "yum
>>> -y install centos-release-gluster"
>>> 2. I have three nodes (not four) with "replica 3" volume.
>>> 3. I created empty ganesha.conf and not empty ganesha-ha.conf in
>>> "/var/run/gluster/shared_storage/nfs-ganesha/"
(referenced blog post is
>>> outdated, this is now requirement)
>>> 4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since
this isn't needed
>>> anymore.
>>>
>>>
>> Please refer to http://gluster.readthedocs.io/
>> en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>
>> It is being updated with latest changes happened wrt setup.
>>
>> When I finish configuration, all is good. nfs-ganesha.service is active
>>> and running and from client I can ping all three VIPs and I can
mount
>>> NFS. Copied files are replicated to all nodes.
>>>
>>> But when I restart nodes (one by one, with 5 min. delay between)
then I
>>> cannot ping or mount (I assume that all VIPs are down). So my setup
>>> definitely isn't HA.
>>>
>>> I found that:
>>> # pcs status
>>> Error: cluster is not currently running on this node
>>>
>>
>> This means pcsd service is not up. Did you enable (systemctl enable
pcsd)
>> pcsd service so that is comes up post reboot automatically. If not
please
>> start it manually.
>>
>>
>>> and nfs-ganesha.service is in inactive state. Btw. I didn't
enable
>>> "systemctl enable nfs-ganesha" since I assume that this
is something
>>> that Gluster does.
>>>
>>
>> Please check /var/log/ganesha.log for any errors/warnings.
>>
>> We recommend not to enable nfs-ganesha.service (by default), as the
>> shared storage (where the ganesha.conf file resides now) should be up
and
>> running before nfs-ganesha gets started.
>> So if enabled by default it could happen that shared_storage mount
point
>> is not yet up and it resulted in nfs-ganesha service failure. If you
would
>> like to address this, you could have a cron job which keeps checking
the
>> mount point health and then start nfs-ganesha service.
>>
>> Thanks,
>> Soumya
>>
>>
>>> I assume that my issue is that I followed instructions in blog post
from
>>> 2015/10 that are outdated. Unfortunately I cannot find anything
better ?
>>> I spent whole day by googling.
>>>
>>> Would you be so kind and check the instructions in blog post and
let me
>>> know what steps are wrong / outdated? Or please do you have more
current
>>> instructions for Gluster+Ganesha setup?
>>>
>>> Thank you.
>>>
>>> Kind regards,
>>> Adam
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>
>
> --
> Adam
>


-- 
Adam
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170505/72d69c2a/attachment.html>

Soumya Koduri

2017-May-05 19:10 UTC

head link

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

On 05/05/2017 08:04 PM, Adam Ru wrote:> Hi Soumya,
>
> Thank you for the answer.
>
> Enabling Pacemaker? Yes, you?re completely right, I didn?t do it. Thank
you.
>
> I spent some time by testing and I have some results. This is what I did:
>
>  - Clean installation of CentOS 7.3 with all updates, 3x node,
> resolvable IPs and VIPs
>  - Stopped firewalld (just for testing)
>  - Install "centos-release-gluster" to get
"centos-gluster310" repo and
> install following (nothing else):
>  --- glusterfs-server
>  --- glusterfs-ganesha
>  - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem
> and secret.pem.pub on all nodes)
>  - systemctl enable and start glusterd
>  - gluster peer probe <other nodes>
>  - gluster volume set all cluster.enable-shared-storage enable
>  - systemctl enable and start pcsd.service
>  - systemctl enable pacemaker.service (cannot be started at this moment)
>  - Set password for hacluster user on all nodes
>  - pcs cluster auth <node 1> <node 2> <node 3> -u
hacluster -p blabla
>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
> sure if needed)
>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
> insert configuration
>  - Try list files on other nodes: ls
> /var/run/gluster/shared_storage/nfs-ganesha/
>  - gluster nfs-ganesha enable
>  - Check on other nodes that nfs-ganesha.service is running and "pcs
> status" shows started resources
>  - gluster volume create mynewshare replica 3 transport tcp
node1:/<dir>
> node2:/<dir> node3:/<dir>
>  - gluster volume start mynewshare
>  - gluster vol set mynewshare ganesha.enable on
>
> After these steps, all VIPs are pingable and I can mount node1:/mynewshare
>
> Funny thing is that pacemaker.service is disabled again (something
> disabled it). This is status of important (I think) services:
yeah. We too had observed this recently. We guess probably pcs cluster 
setup command first destroys existing cluster (if any) which may be 
disabling pacemaker too.>
> systemctl list-units --all
> # corosync.service             loaded    active   running
> # glusterd.service             loaded    active   running
> # nfs-config.service           loaded    inactive dead
> # nfs-ganesha-config.service   loaded    inactive dead
> # nfs-ganesha-lock.service     loaded    active   running
> # nfs-ganesha.service          loaded    active   running
> # nfs-idmapd.service           loaded    inactive dead
> # nfs-mountd.service           loaded    inactive dead
> # nfs-server.service           loaded    inactive dead
> # nfs-utils.service            loaded    inactive dead
> # pacemaker.service            loaded    active   running
> # pcsd.service                 loaded    active   running
>
> systemctl list-unit-files --all
> # corosync-notifyd.service    disabled
> # corosync.service            disabled
> # glusterd.service            enabled
> # glusterfsd.service          disabled
> # nfs-blkmap.service          disabled
> # nfs-config.service          static
> # nfs-ganesha-config.service  static
> # nfs-ganesha-lock.service    static
> # nfs-ganesha.service         disabled
> # nfs-idmap.service           static
> # nfs-idmapd.service          static
> # nfs-lock.service            static
> # nfs-mountd.service          static
> # nfs-rquotad.service         disabled
> # nfs-secure-server.service   static
> # nfs-secure.service          static
> # nfs-server.service          disabled
> # nfs-utils.service           static
> # nfs.service                 disabled
> # nfslock.service             static
> # pacemaker.service           disabled
> # pcsd.service                enabled
>
> I enabled pacemaker again on all nodes and restart all nodes one by one.
>
> After reboot all VIPs are gone and I can see that nfs-ganesha.service
> isn?t running. When I start it on at least two nodes then VIPs are
> pingable again and I can mount NFS again. But there is still some issue
> in the setup because when I check nfs-ganesha-lock.service I get:
>
> systemctl -l status nfs-ganesha-lock.service
> ? nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
> static; vendor preset: disabled)
>    Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC;
> 31min ago
>   Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
> (code=exited, status=1/FAILURE)
>
> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status
> monitor for NFSv2/3 locking....
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0 starting
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
> directory sm: Permission denied
Okay this issue was fixed and the fix should be present in 3.10 too -
    https://review.gluster.org/#/c/16433/

Please check '/var/log/messages' for statd related errors and 
cross-check permissions of that directory. You could manually chown 
owner:group of /var/lib/nfs/statd/sm directory for now and then restart 
nfs-ganesha* services.

Thanks,
Soumya
> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
> /var/lib/nfs/statd/state: Permission denied
> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service:
> control process exited, code=exited status=1
> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status
> monitor for NFSv2/3 locking..
> May 05 13:43:37 node0.localdomain systemd[1]: Unit
> nfs-ganesha-lock.service entered failed state.
> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service
> failed.
>
> Thank you,
>
> Kind regards,
>
> Adam
>
> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan at outlook.com
> <mailto:mahdi.adnan at outlook.com>> wrote:
>
>     Hi,
>
>
>     Same here, when i reboot the node i have to manually execute "pcs
>     cluster start gluster01" and pcsd already enabled and started.
>
>     Gluster 3.8.11
>
>     Centos 7.3 latest
>
>     Installed using CentOS Storage SIG repository
>
>
>
>     --
>
>     Respectfully*
>     **Mahdi A. Mahdi*
>
>    
------------------------------------------------------------------------
>     *From:* gluster-users-bounces at gluster.org
>     <mailto:gluster-users-bounces at gluster.org>
>     <gluster-users-bounces at gluster.org
>     <mailto:gluster-users-bounces at gluster.org>> on behalf of
Adam Ru
>     <ad.ruckel at gmail.com <mailto:ad.ruckel at gmail.com>>
>     *Sent:* Wednesday, May 3, 2017 12:09:58 PM
>     *To:* Soumya Koduri
>     *Cc:* gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>     *Subject:* Re: [Gluster-users] Gluster and NFS-Ganesha - cluster is
>     down after reboot
>
>     Hi Soumya,
>
>     thank you very much for your reply.
>
>     I enabled pcsd during setup and after reboot during troubleshooting
>     I manually started it and checked resources (pcs status). They were
>     not running. I didn?t find what was wrong but I?m going to try it
again.
>
>     I?ve thoroughly checked
>    
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>    
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>     and I can confirm that I followed all steps with one exception. I
>     installed following RPMs:
>     glusterfs-server
>     glusterfs-fuse
>     glusterfs-cli
>     glusterfs-ganesha
>     nfs-ganesha-xfs
>
>     and the guide referenced above specifies:
>     glusterfs-server
>     glusterfs-api
>     glusterfs-ganesha
>
>     glusterfs-api is a dependency of one of RPMs that I installed so
>     this is not a problem. But I cannot find any mention to install
>     nfs-ganesha-xfs.
>
>     I?ll try to setup the whole environment again without installing
>     nfs-ganesha-xfs (I assume glusterfs-ganesha has all required binaries).
>
>     Again, thank you for you time to answer my previous message.
>
>     Kind regards,
>     Adam
>
>     On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri at redhat.com
>     <mailto:skoduri at redhat.com>> wrote:
>
>         Hi,
>
>         On 05/02/2017 01:34 AM, Rudolf wrote:
>
>             Hi Gluster users,
>
>             First, I'd like to thank you all for this amazing
>             open-source! Thank you!
>
>             I'm working on home project ? three servers with Gluster
and
>             NFS-Ganesha. My goal is to create HA NFS share with three
>             copies of each
>             file on each server.
>
>             My systems are CentOS 7.3 Minimal install with the latest
>             updates and
>             the most current RPMs from "centos-gluster310"
repository.
>
>             I followed this tutorial:
>            
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>            
<http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
>             (second half that describes multi-node HA setup)
>
>             with a few exceptions:
>
>             1. All RPMs are from "centos-gluster310" repo that is
>             installed by "yum
>             -y install centos-release-gluster"
>             2. I have three nodes (not four) with "replica 3"
volume.
>             3. I created empty ganesha.conf and not empty ganesha-ha.conf
in
>             "/var/run/gluster/shared_storage/nfs-ganesha/"
(referenced
>             blog post is
>             outdated, this is now requirement)
>             4. ganesha-ha.conf doesn't have "HA_VOL_SERVER"
since this
>             isn't needed
>             anymore.
>
>
>         Please refer to
>        
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>        
<http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>
>         It is being updated with latest changes happened wrt setup.
>
>             When I finish configuration, all is good.
>             nfs-ganesha.service is active
>             and running and from client I can ping all three VIPs and I
>             can mount
>             NFS. Copied files are replicated to all nodes.
>
>             But when I restart nodes (one by one, with 5 min. delay
>             between) then I
>             cannot ping or mount (I assume that all VIPs are down). So
>             my setup
>             definitely isn't HA.
>
>             I found that:
>             # pcs status
>             Error: cluster is not currently running on this node
>
>
>         This means pcsd service is not up. Did you enable (systemctl
>         enable pcsd) pcsd service so that is comes up post reboot
>         automatically. If not please start it manually.
>
>
>             and nfs-ganesha.service is in inactive state. Btw. I didn't
>             enable
>             "systemctl enable nfs-ganesha" since I assume that
this is
>             something
>             that Gluster does.
>
>
>         Please check /var/log/ganesha.log for any errors/warnings.
>
>         We recommend not to enable nfs-ganesha.service (by default), as
>         the shared storage (where the ganesha.conf file resides now)
>         should be up and running before nfs-ganesha gets started.
>         So if enabled by default it could happen that shared_storage
>         mount point is not yet up and it resulted in nfs-ganesha service
>         failure. If you would like to address this, you could have a
>         cron job which keeps checking the mount point health and then
>         start nfs-ganesha service.
>
>         Thanks,
>         Soumya
>
>
>             I assume that my issue is that I followed instructions in
>             blog post from
>             2015/10 that are outdated. Unfortunately I cannot find
>             anything better ?
>             I spent whole day by googling.
>
>             Would you be so kind and check the instructions in blog post
>             and let me
>             know what steps are wrong / outdated? Or please do you have
>             more current
>             instructions for Gluster+Ganesha setup?
>
>             Thank you.
>
>             Kind regards,
>             Adam
>
>
>
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>             http://lists.gluster.org/mailman/listinfo/gluster-users
>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
>     --
>     Adam
>
>
>
>
> --
> Adam

Gluster users - May 2017 - Gluster and NFS-Ganesha - cluster is down after reboot

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot