thr3ads.net - Gluster users - [Gluster-users] Questions on ganesha HA and shared storage size [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Alessandro De Salvo

2015-Jun-10 00:19 UTC

[Gluster-users] Questions on ganesha HA and shared storage size

Hi,
I have enabled the full debug already, but I see nothing special. Before
exporting any volume the log shows no error, even when I do a showmount (the log
is attached, ganesha.log.gz). If I do the same after exporting a volume
nfs-ganesha does not even start, complaining for not being able to bind the IPv6
ruota socket, but in fact there is nothing listening on IPv6, so it should not
happen:

tcp6       0      0 :::111                  :::*                    LISTEN     
7433/rpcbind
tcp6       0      0 :::2224                 :::*                    LISTEN     
9054/ruby
tcp6       0      0 :::22                   :::*                    LISTEN     
1248/sshd
udp6       0      0 :::111                  :::*                               
7433/rpcbind
udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                               
31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                               
31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                               
31238/ntpd
udp6       0      0 fe80::230:48ff:fed2:123 :::*                               
31238/ntpd
udp6       0      0 ::1:123                 :::*                               
31238/ntpd
udp6       0      0 fe80::5484:7aff:fef:123 :::*                               
31238/ntpd
udp6       0      0 :::123                  :::*                               
31238/ntpd
udp6       0      0 :::824                  :::*                               
7433/rpcbind

The error, as shown in the attached ganesha-after-export.log.gz logfile, is the
following:


10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main]
Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address
already in use)
10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main]
Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main]
glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded


Thanks,

	Alessandro

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ganesha.log.gz
Type: application/x-gzip
Size: 19427 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150610/904ae82b/attachment.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ganesha-after-export.log.gz
Type: application/x-gzip
Size: 6936 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150610/904ae82b/attachment-0001.gz>
-------------- next part --------------
> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri <skoduri at
redhat.com> ha scritto:
> 
> 
> 
> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
>> Another update: the fact that I was unable to use vol set
ganesha.enable
>> was due to another bug in the ganesha scripts. In short, they are all
>> using the following line to get the location of the conf file:
>> 
>> CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut -f
2 -d "=")
>> 
>> First of all by default in /etc/sysconfig/ganesha there is no line
>> CONFFILE, second there is a bug in that directive, as it works if I add
>> in /etc/sysconfig/ganesha
>> 
>> CONFFILE=/etc/ganesha/ganesha.conf
>> 
>> but it fails if the same is quoted
>> 
>> CONFFILE="/etc/ganesha/ganesha.conf"
>> 
>> It would be much better to use the following, which has a default as
>> well:
>> 
>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>> 
>> I'll update the bug report.
>> Having said this... the last issue to tackle is the real problem with
>> the ganesha.nfsd :-(
> 
> Thanks. Could you try changing log level to NIV_FULL_DEBUG in
'/etc/sysconfig/ganesha' and check if anything gets logged in
'/var/log/ganesha.log' or '/ganesha.log'.
> 
> Thanks,
> Soumya
> 
>> Cheers,
>> 
>> 	Alessandro
>> 
>> 
>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
>>> OK, I can confirm that the ganesha.nsfd process is actually not
>>> answering to the calls. Here it is what I see:
>>> 
>>> # rpcinfo -p
>>>    program vers proto   port  service
>>>     100000    4   tcp    111  portmapper
>>>     100000    3   tcp    111  portmapper
>>>     100000    2   tcp    111  portmapper
>>>     100000    4   udp    111  portmapper
>>>     100000    3   udp    111  portmapper
>>>     100000    2   udp    111  portmapper
>>>     100024    1   udp  41594  status
>>>     100024    1   tcp  53631  status
>>>     100003    3   udp   2049  nfs
>>>     100003    3   tcp   2049  nfs
>>>     100003    4   udp   2049  nfs
>>>     100003    4   tcp   2049  nfs
>>>     100005    1   udp  58127  mountd
>>>     100005    1   tcp  56301  mountd
>>>     100005    3   udp  58127  mountd
>>>     100005    3   tcp  56301  mountd
>>>     100021    4   udp  46203  nlockmgr
>>>     100021    4   tcp  41798  nlockmgr
>>>     100011    1   udp    875  rquotad
>>>     100011    1   tcp    875  rquotad
>>>     100011    2   udp    875  rquotad
>>>     100011    2   tcp    875  rquotad
>>> 
>>> # netstat -lpn | grep ganesha
>>> tcp6      14      0 :::2049                 :::*
>>> LISTEN      11937/ganesha.nfsd
>>> tcp6       0      0 :::41798                :::*
>>> LISTEN      11937/ganesha.nfsd
>>> tcp6       0      0 :::875                  :::*
>>> LISTEN      11937/ganesha.nfsd
>>> tcp6      10      0 :::56301                :::*
>>> LISTEN      11937/ganesha.nfsd
>>> tcp6       0      0 :::564                  :::*
>>> LISTEN      11937/ganesha.nfsd
>>> udp6       0      0 :::2049                 :::*
>>> 11937/ganesha.nfsd
>>> udp6       0      0 :::46203                :::*
>>> 11937/ganesha.nfsd
>>> udp6       0      0 :::58127                :::*
>>> 11937/ganesha.nfsd
>>> udp6       0      0 :::875                  :::*
>>> 11937/ganesha.nfsd
>>> 
>>> I'm attaching the strace of a showmount from a node to the
other.
>>> This machinery was working with nfs-ganesha 2.1.0, so it must be
>>> something introduced with 2.2.0.
>>> Cheers,
>>> 
>>> 	Alessandro
>>> 
>>> 
>>> 
>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>> 
>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>>> Hi,
>>>>> OK, the problem with the VIPs not starting is due to the
ganesha_mon
>>>>> heartbeat script looking for a pid file called
>>>>> /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd
v.2.2.0 is
>>>>> creating /var/run/ganesha.pid, this needs to be corrected.
The file is
>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
>>>>> For the moment I have created a symlink in this way and it
works:
>>>>> 
>>>>> ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
>>>>> 
>>>> Thanks. Please update this as well in the bug.
>>>> 
>>>>> So far so good, the VIPs are up and pingable, but still
there is the
>>>>> problem of the hanging showmount (i.e. hanging RPC).
>>>>> Still, I see a lot of errors like this in
/var/log/messages:
>>>>> 
>>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice:
operation_finished:
>>>>> nfs-mon_monitor_10000:29292:stderr [ Error: Resource does
not exist. ]
>>>>> 
>>>>> While ganesha.log shows the server is not in grace:
>>>>> 
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd
Starting:
>>>>> Ganesha Version
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>>> May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
>>>>> <http://buildhw-09.phx2.fedoraproject.org>
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS
STARTUP :EVENT
>>>>> :Configuration file successfully parsed
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP
:EVENT
>>>>> :Initializing ID Mapper.
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP
:EVENT :ID Mapper
>>>>> successfully initialized.
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export
entries
>>>>> found in configuration file !!!
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN
:Config File
>>>>> ((null):0): Empty configuration file
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
>>>>> :CAP_SYS_RESOURCE was successfully removed for proper quota
management
>>>>> in FSAL
>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
:currenty set
>>>>> capabilities are: >>>>>
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot
acquire
>>>>> credentials for principal nfs
>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB
:EVENT :Admin
>>>>> thread initialized
>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT
:NFS Server Now
>>>>> IN GRACE, duration 60
>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS
STARTUP :EVENT
>>>>> :Callback creds directory (/var/run/ganesha) already exists
>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS
STARTUP :WARN
>>>>> :gssd_refresh_krb5_machine_credential failed (2:2)
>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:Starting
>>>>> delayed executor.
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:9P/TCP
>>>>> dispatcher thread was started successfully
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP
:EVENT :9P
>>>>> dispatcher started
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
>>>>> :gsh_dbusthread was started successfully
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:admin thread
>>>>> was started successfully
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:reaper thread
>>>>> was started successfully
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS
Server Now IN
>>>>> GRACE
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:General
>>>>> fridge was started successfully
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>>> :-------------------------------------------------
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :   
NFS
>>>>> SERVER INITIALIZED
>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>>> :-------------------------------------------------
>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS
Server Now
>>>>> NOT IN GRACE
>>>>> 
>>>>> 
>>>> Please check the status of nfs-ganesha
>>>> $service nfs-ganesha status
>>>> 
>>>> Could you try taking a packet trace (during showmount or mount)
and
>>>> check the server responses.
>>>> 
>>>> Thanks,
>>>> Soumya
>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Alessandro
>>>>> 
>>>>> 
>>>>>> Il giorno 09/giu/2015, alle ore 10:36, Alessandro De
Salvo
>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>> <mailto:alessandro.desalvo at roma1.infn.it>>
ha scritto:
>>>>>> 
>>>>>> Hi Soumya,
>>>>>> 
>>>>>>> Il giorno 09/giu/2015, alle ore 08:06, Soumya
Koduri
>>>>>>> <skoduri at redhat.com <mailto:skoduri at
redhat.com>> ha scritto:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
>>>>>>>> OK, I found at least one of the bugs.
>>>>>>>> The /usr/libexec/ganesha/ganesha.sh has the
following lines:
>>>>>>>> 
>>>>>>>>    if [ -e /etc/os-release ]; then
>>>>>>>>        RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>    fi
>>>>>>>> 
>>>>>>>> This is OK for RHEL < 7, but does not work
for >= 7. I have changed
>>>>>>>> it to the following, to make it working:
>>>>>>>> 
>>>>>>>>    if [ -e /etc/os-release ]; then
>>>>>>>>        eval $(grep -F
"REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>>        [ "$REDHAT_SUPPORT_PRODUCT" ==
"Fedora" ] &&
>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>    fi
>>>>>>>> 
>>>>>>> Oh..Thanks for the fix. Could you please file a bug
for the same (and
>>>>>>> probably submit your fix as well). We shall have it
corrected.
>>>>>> 
>>>>>> Just did
it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>> 
>>>>>>> 
>>>>>>>> Apart from that, the VIP_<node> I was
using were wrong, and I should
>>>>>>>> have converted all the ?-? to underscores,
maybe this could be
>>>>>>>> mentioned in the documentation when you will
have it ready.
>>>>>>>> Now, the cluster starts, but the VIPs
apparently not:
>>>>>>>> 
>>>>>>> Sure. Thanks again for pointing it out. We shall
make a note of it.
>>>>>>> 
>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>> 
>>>>>>>> Full list of resources:
>>>>>>>> 
>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>>>> atlas-node1-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>> atlas-node1-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>> atlas-node2-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>> atlas-node2-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>> 
>>>>>>>> PCSD Status:
>>>>>>>>  atlas-node1: Online
>>>>>>>>  atlas-node2: Online
>>>>>>>> 
>>>>>>>> Daemon Status:
>>>>>>>>  corosync: active/disabled
>>>>>>>>  pacemaker: active/disabled
>>>>>>>>  pcsd: active/enabled
>>>>>>>> 
>>>>>>>> 
>>>>>>> Here corosync and pacemaker shows
'disabled' state. Can you check the
>>>>>>> status of their services. They should be running
prior to cluster
>>>>>>> creation. We need to include that step in document
as well.
>>>>>> 
>>>>>> Ah, OK, you?re right, I have added it to my puppet
modules (we install
>>>>>> and configure ganesha via puppet, I?ll put the module
on puppetforge
>>>>>> soon, in case anyone is interested).
>>>>>> 
>>>>>>> 
>>>>>>>> But the issue that is puzzling me more is the
following:
>>>>>>>> 
>>>>>>>> # showmount -e localhost
>>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>> 
>>>>>>>> And when I try to enable the ganesha exports on
a volume I get this
>>>>>>>> error:
>>>>>>>> 
>>>>>>>> # gluster volume set atlas-home-01
ganesha.enable on
>>>>>>>> volume set: failed: Failed to create
NFS-Ganesha export config file.
>>>>>>>> 
>>>>>>>> But I see the file created in
/etc/ganesha/exports/*.conf
>>>>>>>> Still, showmount hangs and times out.
>>>>>>>> Any help?
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>> Hmm that's strange. Sometimes, in case if there
was no proper cleanup
>>>>>>> done while trying to re-create the cluster, we have
seen such issues.
>>>>>>> 
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>> 
>>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>> 
>>>>>>> Can you please unexport all the volumes, teardown
the cluster using
>>>>>>> 'gluster vol set <volname> ganesha.enable
off?
>>>>>> 
>>>>>> OK:
>>>>>> 
>>>>>> # gluster vol set atlas-home-01 ganesha.enable off
>>>>>> volume set: failed: ganesha.enable is already
'off'.
>>>>>> 
>>>>>> # gluster vol set atlas-data-01 ganesha.enable off
>>>>>> volume set: failed: ganesha.enable is already
'off'.
>>>>>> 
>>>>>> 
>>>>>>> 'gluster ganesha disable' command.
>>>>>> 
>>>>>> I?m assuming you wanted to write nfs-ganesha instead?
>>>>>> 
>>>>>> # gluster nfs-ganesha disable
>>>>>> ganesha enable : success
>>>>>> 
>>>>>> 
>>>>>> A side note (not really important): it?s strange that
when I do a
>>>>>> disable the message is ?ganesha enable? :-)
>>>>>> 
>>>>>>> 
>>>>>>> Verify if the following files have been deleted on
all the nodes-
>>>>>>> '/etc/cluster/cluster.conf?
>>>>>> 
>>>>>> this file is not present at all, I think it?s not
needed in CentOS 7
>>>>>> 
>>>>>>> '/etc/ganesha/ganesha.conf?,
>>>>>> 
>>>>>> it?s still there, but empty, and I guess it should be
OK, right?
>>>>>> 
>>>>>>> '/etc/ganesha/exports/*?
>>>>>> 
>>>>>> no more files there
>>>>>> 
>>>>>>> '/var/lib/pacemaker/cib?
>>>>>> 
>>>>>> it?s empty
>>>>>> 
>>>>>>> 
>>>>>>> Verify if the ganesha service is stopped on all the
nodes.
>>>>>> 
>>>>>> nope, it?s still running, I will stop it.
>>>>>> 
>>>>>>> 
>>>>>>> start/restart the services - corosync, pcs.
>>>>>> 
>>>>>> In the node where I issued the nfs-ganesha disable
there is no more
>>>>>> any /etc/corosync/corosync.conf so corosync won?t
start. The other
>>>>>> node instead still has the file, it?s strange.
>>>>>> 
>>>>>>> 
>>>>>>> And re-try the HA cluster creation
>>>>>>> 'gluster ganesha enable?
>>>>>> 
>>>>>> This time (repeated twice) it did not work at all:
>>>>>> 
>>>>>> # pcs status
>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>> Last updated: Tue Jun  9 10:13:43 2015
>>>>>> Last change: Tue Jun  9 10:13:22 2015
>>>>>> Stack: corosync
>>>>>> Current DC: atlas-node1 (1) - partition with quorum
>>>>>> Version: 1.1.12-a14efad
>>>>>> 2 Nodes configured
>>>>>> 6 Resources configured
>>>>>> 
>>>>>> 
>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>> 
>>>>>> Full list of resources:
>>>>>> 
>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy):
Started atlas-node1
>>>>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy):
Started atlas-node2
>>>>>> 
>>>>>> PCSD Status:
>>>>>>  atlas-node1: Online
>>>>>>  atlas-node2: Online
>>>>>> 
>>>>>> Daemon Status:
>>>>>>  corosync: active/enabled
>>>>>>  pacemaker: active/enabled
>>>>>>  pcsd: active/enabled
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I tried then "pcs cluster destroy" on both
nodes, and then again
>>>>>> nfs-ganesha enable, but now I?m back to the old
problem:
>>>>>> 
>>>>>> # pcs status
>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>> Last updated: Tue Jun  9 10:22:27 2015
>>>>>> Last change: Tue Jun  9 10:17:00 2015
>>>>>> Stack: corosync
>>>>>> Current DC: atlas-node2 (2) - partition with quorum
>>>>>> Version: 1.1.12-a14efad
>>>>>> 2 Nodes configured
>>>>>> 10 Resources configured
>>>>>> 
>>>>>> 
>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>> 
>>>>>> Full list of resources:
>>>>>> 
>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>> atlas-node1-cluster_ip-1       (ocf::heartbeat:IPaddr):
Stopped
>>>>>> atlas-node1-trigger_ip-1       (ocf::heartbeat:Dummy):
Started atlas-node1
>>>>>> atlas-node2-cluster_ip-1       (ocf::heartbeat:IPaddr):
Stopped
>>>>>> atlas-node2-trigger_ip-1       (ocf::heartbeat:Dummy):
Started atlas-node2
>>>>>> atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy): Started
atlas-node1
>>>>>> atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy): Started
atlas-node2
>>>>>> 
>>>>>> PCSD Status:
>>>>>>  atlas-node1: Online
>>>>>>  atlas-node2: Online
>>>>>> 
>>>>>> Daemon Status:
>>>>>>  corosync: active/enabled
>>>>>>  pacemaker: active/enabled
>>>>>>  pcsd: active/enabled
>>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Alessandro
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>> 
>>>>>>>> Alessandro
>>>>>>>> 
>>>>>>>>> Il giorno 08/giu/2015, alle ore 20:00,
Alessandro De Salvo
>>>>>>>>> <Alessandro.DeSalvo at roma1.infn.it
>>>>>>>>> <mailto:Alessandro.DeSalvo at
roma1.infn.it>> ha scritto:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> indeed, it does not work :-)
>>>>>>>>> OK, this is what I did, with 2 machines,
running CentOS 7.1,
>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
>>>>>>>>> 
>>>>>>>>> 1) ensured that the machines are able to
resolve their IPs (but
>>>>>>>>> this was already true since they were in
the DNS);
>>>>>>>>> 2) disabled NetworkManager and enabled
network on both machines;
>>>>>>>>> 3) created a gluster shared volume
'gluster_shared_storage' and
>>>>>>>>> mounted it on
'/run/gluster/shared_storage' on all the cluster
>>>>>>>>> nodes using glusterfs native mount (on
CentOS 7.1 there is a link
>>>>>>>>> by default /var/run -> ../run)
>>>>>>>>> 4) created an empty
/etc/ganesha/ganesha.conf;
>>>>>>>>> 5) installed pacemaker pcs resource-agents
corosync on all cluster
>>>>>>>>> machines;
>>>>>>>>> 6) set the ?hacluster? user the same
password on all machines;
>>>>>>>>> 7) pcs cluster auth <hostname> -u
hacluster -p <pass> on all the
>>>>>>>>> nodes (on both nodes I issued the commands
for both nodes)
>>>>>>>>> 8) IPv6 is configured by default on all
nodes, although the
>>>>>>>>> infrastructure is not ready for IPv6
>>>>>>>>> 9) enabled pcsd and started it on all nodes
>>>>>>>>> 10) populated /etc/ganesha/ganesha-ha.conf
with the following
>>>>>>>>> contents, one per machine:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ===> atlas-node1
>>>>>>>>> # Name of the HA cluster created.
>>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>> # The server from which you intend to mount
>>>>>>>>> # the shared volume.
>>>>>>>>> HA_VOL_SERVER=?atlas-node1"
>>>>>>>>> # The subset of nodes of the Gluster
Trusted Pool
>>>>>>>>> # that forms the ganesha HA cluster.
IP/Hostname
>>>>>>>>> # is specified.
>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>> # Virtual IPs of each of the nodes
specified above.
>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>> VIP_atlas-node2=?x.x.x.2"
>>>>>>>>> 
>>>>>>>>> ===> atlas-node2
>>>>>>>>> # Name of the HA cluster created.
>>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>> # The server from which you intend to mount
>>>>>>>>> # the shared volume.
>>>>>>>>> HA_VOL_SERVER=?atlas-node2"
>>>>>>>>> # The subset of nodes of the Gluster
Trusted Pool
>>>>>>>>> # that forms the ganesha HA cluster.
IP/Hostname
>>>>>>>>> # is specified.
>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>> # Virtual IPs of each of the nodes
specified above.
>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>> VIP_atlas-node2=?x.x.x.2?
>>>>>>>>> 
>>>>>>>>> 11) issued gluster nfs-ganesha enable, but
it fails with a cryptic
>>>>>>>>> message:
>>>>>>>>> 
>>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>>> Enabling NFS-Ganesha requires Gluster-NFS
to be disabled across the
>>>>>>>>> trusted pool. Do you still want to
continue? (y/n) y
>>>>>>>>> nfs-ganesha: failed: Failed to set up HA
config for NFS-Ganesha.
>>>>>>>>> Please check the log file for details
>>>>>>>>> 
>>>>>>>>> Looking at the logs I found nothing really
special but this:
>>>>>>>>> 
>>>>>>>>> ==>
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log
<=>>>>>>>>> [2015-06-08 17:57:15.672844] I [MSGID:
106132]
>>>>>>>>>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>>> already stopped
>>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>>> [glusterd-ganesha.c:386:check_host_list]
0-management: ganesha host
>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>>> [glusterd-ganesha.c:386:check_host_list]
0-management: ganesha host
>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>>> [glusterd-ganesha.c:335:is_ganesha_host]
0-management: ganesha host
>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>>>
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>>> Initial NFS-Ganesha set up failed
>>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>>> [glusterd-syncop.c:1396:gd_commit_op_phase]
0-management: Commit of
>>>>>>>>> operation 'Volume (null)' failed on
localhost : Failed to set up HA
>>>>>>>>> config for NFS-Ganesha. Please check the
log file for details
>>>>>>>>> 
>>>>>>>>> ==> /var/log/glusterfs/cmd_history.log
<=>>>>>>>>> [2015-06-08 17:57:16.643615]  :
nfs-ganesha enable : FAILED :
>>>>>>>>> Failed to set up HA config for NFS-Ganesha.
Please check the log
>>>>>>>>> file for details
>>>>>>>>> 
>>>>>>>>> ==> /var/log/glusterfs/cli.log
<=>>>>>>>>> [2015-06-08 17:57:16.643839] I
[input.c:36:cli_batch] 0-: Exiting
>>>>>>>>> with: -1
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Also, pcs seems to be fine for the auth
part, although it obviously
>>>>>>>>> tells me the cluster is not running.
>>>>>>>>> 
>>>>>>>>> I, [2015-06-08T19:57:16.305323 #7223]  INFO
-- : Running:
>>>>>>>>> /usr/sbin/corosync-cmapctl
totem.cluster_name
>>>>>>>>> I, [2015-06-08T19:57:16.345457 #7223]  INFO
-- : Running:
>>>>>>>>> /usr/sbin/pcs cluster token-nodes
>>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015
19:57:16] "GET
>>>>>>>>> /remote/check_auth HTTP/1.1" 200 68
0.1919
>>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015
19:57:16] "GET
>>>>>>>>> /remote/check_auth HTTP/1.1" 200 68
0.1920
>>>>>>>>> atlas-node1.mydomain - -
[08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>>> /remote/check_auth HTTP/1.1" 200 68
>>>>>>>>> - -> /remote/check_auth
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What am I doing wrong?
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Alessandro
>>>>>>>>> 
>>>>>>>>>> Il giorno 08/giu/2015, alle ore 19:30,
Soumya Koduri
>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 06/08/2015 08:20 PM, Alessandro De
Salvo wrote:
>>>>>>>>>>> Sorry, just another question:
>>>>>>>>>>> 
>>>>>>>>>>> - in my installation of gluster
3.7.1 the command gluster
>>>>>>>>>>> features.ganesha enable does not
work:
>>>>>>>>>>> 
>>>>>>>>>>> # gluster features.ganesha enable
>>>>>>>>>>> unrecognized word: features.ganesha
(position 0)
>>>>>>>>>>> 
>>>>>>>>>>> Which version has full support for
it?
>>>>>>>>>> 
>>>>>>>>>> Sorry. This option has recently been
changed. It is now
>>>>>>>>>> 
>>>>>>>>>> $ gluster nfs-ganesha enable
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - in the documentation the ccs and
cman packages are required,
>>>>>>>>>>> but they seems not to be available
anymore on CentOS 7 and
>>>>>>>>>>> similar, I guess they are not
really required anymore, as pcs
>>>>>>>>>>> should do the full job
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Alessandro
>>>>>>>>>> 
>>>>>>>>>> Looks like so from
http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>>> Let us know if it doesn't work.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore
15:09, Alessandro De Salvo
>>>>>>>>>>>> <alessandro.desalvo at
roma1.infn.it
>>>>>>>>>>>> <mailto:alessandro.desalvo
at roma1.infn.it>> ha scritto:
>>>>>>>>>>>> 
>>>>>>>>>>>> Great, many thanks Soumya!
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> 
>>>>>>>>>>>> Alessandro
>>>>>>>>>>>> 
>>>>>>>>>>>>> Il giorno 08/giu/2015, alle
ore 13:53, Soumya Koduri
>>>>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please find the slides of
the demo video at [1]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We recommend to have a
distributed replica volume as a shared
>>>>>>>>>>>>> volume for better
data-availability.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Size of the volume depends
on the workload you may have. Since
>>>>>>>>>>>>> it is used to maintain
states of NLM/NFSv4 clients, you may
>>>>>>>>>>>>> calculate the size of the
volume to be minimum of aggregate of
>>>>>>>>>>>>>
(typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>>>
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We shall document about
this feature sooner in the gluster docs
>>>>>>>>>>>>> as well.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] -
http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 06/08/2015 04:34 PM,
Alessandro De Salvo wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> I have seen the demo
video on ganesha HA,
>>>>>>>>>>>>>>
https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>>> However there is no
advice on the appropriate size of the
>>>>>>>>>>>>>> shared volume. How is
it really used, and what should be a
>>>>>>>>>>>>>> reasonable size for it?
>>>>>>>>>>>>>> Also, are the slides
from the video available somewhere, as
>>>>>>>>>>>>>> well as a documentation
on all this? I did not manage to find
>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing
list
>>>>>>>>>>>>>> Gluster-users at
gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> 
>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1770 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150610/904ae82b/attachment.p7s>

Soumya Koduri

2015-Jun-10 09:58 UTC

head link

[Gluster-users] Questions on ganesha HA and shared storage size

On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:> Hi,
> I have enabled the full debug already, but I see nothing special. Before
exporting any volume the log shows no error, even when I do a showmount (the log
is attached, ganesha.log.gz). If I do the same after exporting a volume
nfs-ganesha does not even start, complaining for not being able to bind the IPv6
ruota socket, but in fact there is nothing listening on IPv6, so it should not
happen:
>
> tcp6       0      0 :::111                  :::*                    LISTEN 
7433/rpcbind
> tcp6       0      0 :::2224                 :::*                    LISTEN 
9054/ruby
> tcp6       0      0 :::22                   :::*                    LISTEN 
1248/sshd
> udp6       0      0 :::111                  :::*                           
7433/rpcbind
> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                           
31238/ntpd
> udp6       0      0 fe80::230:48ff:fed2:123 :::*                           
31238/ntpd
> udp6       0      0 fe80::230:48ff:fed2:123 :::*                           
31238/ntpd
> udp6       0      0 fe80::230:48ff:fed2:123 :::*                           
31238/ntpd
> udp6       0      0 ::1:123                 :::*                           
31238/ntpd
> udp6       0      0 fe80::5484:7aff:fef:123 :::*                           
31238/ntpd
> udp6       0      0 :::123                  :::*                           
31238/ntpd
> udp6       0      0 :::824                  :::*                           
7433/rpcbind
>
> The error, as shown in the attached ganesha-after-export.log.gz logfile, is
the following:
>
>
> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main]
Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address
already in use)
> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main]
Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main]
glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>
We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
there could be delay or issue with Gluster-NFS un-registering those 
services and when NFS-Ganesha tries to register to the same port, it 
throws this error. Please try registering Rquota to any random port 
using below config option in "/etc/ganesha/ganesha.conf"

NFS_Core_Param {
         #Use a non-privileged port for RQuota
         Rquota_Port = 4501;
}

and cleanup '/var/cache/rpcbind/' directory before the setup.

Thanks,
Soumya
>
> Thanks,
>
> 	Alessandro
>
>
>
>
>> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri <skoduri at
redhat.com> ha scritto:
>>
>>
>>
>> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
>>> Another update: the fact that I was unable to use vol set
ganesha.enable
>>> was due to another bug in the ganesha scripts. In short, they are
all
>>> using the following line to get the location of the conf file:
>>>
>>> CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE" | cut
-f 2 -d "=")
>>>
>>> First of all by default in /etc/sysconfig/ganesha there is no line
>>> CONFFILE, second there is a bug in that directive, as it works if I
add
>>> in /etc/sysconfig/ganesha
>>>
>>> CONFFILE=/etc/ganesha/ganesha.conf
>>>
>>> but it fails if the same is quoted
>>>
>>> CONFFILE="/etc/ganesha/ganesha.conf"
>>>
>>> It would be much better to use the following, which has a default
as
>>> well:
>>>
>>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
>>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>>>
>>> I'll update the bug report.
>>> Having said this... the last issue to tackle is the real problem
with
>>> the ganesha.nfsd :-(
>>
>> Thanks. Could you try changing log level to NIV_FULL_DEBUG in
'/etc/sysconfig/ganesha' and check if anything gets logged in
'/var/log/ganesha.log' or '/ganesha.log'.
>>
>> Thanks,
>> Soumya
>>
>>> Cheers,
>>>
>>> 	Alessandro
>>>
>>>
>>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
>>>> OK, I can confirm that the ganesha.nsfd process is actually not
>>>> answering to the calls. Here it is what I see:
>>>>
>>>> # rpcinfo -p
>>>>     program vers proto   port  service
>>>>      100000    4   tcp    111  portmapper
>>>>      100000    3   tcp    111  portmapper
>>>>      100000    2   tcp    111  portmapper
>>>>      100000    4   udp    111  portmapper
>>>>      100000    3   udp    111  portmapper
>>>>      100000    2   udp    111  portmapper
>>>>      100024    1   udp  41594  status
>>>>      100024    1   tcp  53631  status
>>>>      100003    3   udp   2049  nfs
>>>>      100003    3   tcp   2049  nfs
>>>>      100003    4   udp   2049  nfs
>>>>      100003    4   tcp   2049  nfs
>>>>      100005    1   udp  58127  mountd
>>>>      100005    1   tcp  56301  mountd
>>>>      100005    3   udp  58127  mountd
>>>>      100005    3   tcp  56301  mountd
>>>>      100021    4   udp  46203  nlockmgr
>>>>      100021    4   tcp  41798  nlockmgr
>>>>      100011    1   udp    875  rquotad
>>>>      100011    1   tcp    875  rquotad
>>>>      100011    2   udp    875  rquotad
>>>>      100011    2   tcp    875  rquotad
>>>>
>>>> # netstat -lpn | grep ganesha
>>>> tcp6      14      0 :::2049                 :::*
>>>> LISTEN      11937/ganesha.nfsd
>>>> tcp6       0      0 :::41798                :::*
>>>> LISTEN      11937/ganesha.nfsd
>>>> tcp6       0      0 :::875                  :::*
>>>> LISTEN      11937/ganesha.nfsd
>>>> tcp6      10      0 :::56301                :::*
>>>> LISTEN      11937/ganesha.nfsd
>>>> tcp6       0      0 :::564                  :::*
>>>> LISTEN      11937/ganesha.nfsd
>>>> udp6       0      0 :::2049                 :::*
>>>> 11937/ganesha.nfsd
>>>> udp6       0      0 :::46203                :::*
>>>> 11937/ganesha.nfsd
>>>> udp6       0      0 :::58127                :::*
>>>> 11937/ganesha.nfsd
>>>> udp6       0      0 :::875                  :::*
>>>> 11937/ganesha.nfsd
>>>>
>>>> I'm attaching the strace of a showmount from a node to the
other.
>>>> This machinery was working with nfs-ganesha 2.1.0, so it must
be
>>>> something introduced with 2.2.0.
>>>> Cheers,
>>>>
>>>> 	Alessandro
>>>>
>>>>
>>>>
>>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>>>
>>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>>>> Hi,
>>>>>> OK, the problem with the VIPs not starting is due to
the ganesha_mon
>>>>>> heartbeat script looking for a pid file called
>>>>>> /var/run/ganesha.nfsd.pid, while by default
ganesha.nfsd v.2.2.0 is
>>>>>> creating /var/run/ganesha.pid, this needs to be
corrected. The file is
>>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
>>>>>> For the moment I have created a symlink in this way and
it works:
>>>>>>
>>>>>> ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
>>>>>>
>>>>> Thanks. Please update this as well in the bug.
>>>>>
>>>>>> So far so good, the VIPs are up and pingable, but still
there is the
>>>>>> problem of the hanging showmount (i.e. hanging RPC).
>>>>>> Still, I see a lot of errors like this in
/var/log/messages:
>>>>>>
>>>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice:
operation_finished:
>>>>>> nfs-mon_monitor_10000:29292:stderr [ Error: Resource
does not exist. ]
>>>>>>
>>>>>> While ganesha.log shows the server is not in grace:
>>>>>>
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT
:ganesha.nfsd Starting:
>>>>>> Ganesha Version
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>>>> May 18 2015 14:17:18 on
buildhw-09.phx2.fedoraproject.org
>>>>>> <http://buildhw-09.phx2.fedoraproject.org>
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS
STARTUP :EVENT
>>>>>> :Configuration file successfully parsed
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP
:EVENT
>>>>>> :Initializing ID Mapper.
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP
:EVENT :ID Mapper
>>>>>> successfully initialized.
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No
export entries
>>>>>> found in configuration file !!!
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] config_errs_to_log :CONFIG
:WARN :Config File
>>>>>> ((null):0): Empty configuration file
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP
:EVENT
>>>>>> :CAP_SYS_RESOURCE was successfully removed for proper
quota management
>>>>>> in FSAL
>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP
:EVENT :currenty set
>>>>>> capabilities are: >>>>>>
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT
:Cannot acquire
>>>>>> credentials for principal nfs
>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB
:EVENT :Admin
>>>>>> thread initialized
>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT
:NFS Server Now
>>>>>> IN GRACE, duration 60
>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS
STARTUP :EVENT
>>>>>> :Callback creds directory (/var/run/ganesha) already
exists
>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS
STARTUP :WARN
>>>>>> :gssd_refresh_krb5_machine_credential failed (2:2)
>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :Starting
>>>>>> delayed executor.
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :9P/TCP
>>>>>> dispatcher thread was started successfully
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P
DISP :EVENT :9P
>>>>>> dispatcher started
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT
>>>>>> :gsh_dbusthread was started successfully
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :admin thread
>>>>>> was started successfully
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :reaper thread
>>>>>> was started successfully
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT
:NFS Server Now IN
>>>>>> GRACE
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :General
>>>>>> fridge was started successfully
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>>>> :-------------------------------------------------
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
:             NFS
>>>>>> SERVER INITIALIZED
>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
>>>>>> :-------------------------------------------------
>>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1 :
>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT
:NFS Server Now
>>>>>> NOT IN GRACE
>>>>>>
>>>>>>
>>>>> Please check the status of nfs-ganesha
>>>>> $service nfs-ganesha status
>>>>>
>>>>> Could you try taking a packet trace (during showmount or
mount) and
>>>>> check the server responses.
>>>>>
>>>>> Thanks,
>>>>> Soumya
>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Alessandro
>>>>>>
>>>>>>
>>>>>>> Il giorno 09/giu/2015, alle ore 10:36, Alessandro
De Salvo
>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>> <mailto:alessandro.desalvo at
roma1.infn.it>> ha scritto:
>>>>>>>
>>>>>>> Hi Soumya,
>>>>>>>
>>>>>>>> Il giorno 09/giu/2015, alle ore 08:06, Soumya
Koduri
>>>>>>>> <skoduri at redhat.com <mailto:skoduri at
redhat.com>> ha scritto:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/09/2015 01:31 AM, Alessandro De Salvo
wrote:
>>>>>>>>> OK, I found at least one of the bugs.
>>>>>>>>> The /usr/libexec/ganesha/ganesha.sh has the
following lines:
>>>>>>>>>
>>>>>>>>>     if [ -e /etc/os-release ]; then
>>>>>>>>>         RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>     fi
>>>>>>>>>
>>>>>>>>> This is OK for RHEL < 7, but does not
work for >= 7. I have changed
>>>>>>>>> it to the following, to make it working:
>>>>>>>>>
>>>>>>>>>     if [ -e /etc/os-release ]; then
>>>>>>>>>         eval $(grep -F
"REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>>>         [
"$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>     fi
>>>>>>>>>
>>>>>>>> Oh..Thanks for the fix. Could you please file a
bug for the same (and
>>>>>>>> probably submit your fix as well). We shall
have it corrected.
>>>>>>>
>>>>>>> Just did
it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>>>
>>>>>>>>
>>>>>>>>> Apart from that, the VIP_<node> I was
using were wrong, and I should
>>>>>>>>> have converted all the ?-? to underscores,
maybe this could be
>>>>>>>>> mentioned in the documentation when you
will have it ready.
>>>>>>>>> Now, the cluster starts, but the VIPs
apparently not:
>>>>>>>>>
>>>>>>>> Sure. Thanks again for pointing it out. We
shall make a note of it.
>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node1-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node1-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node2-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>>   atlas-node1: Online
>>>>>>>>>   atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>>   corosync: active/disabled
>>>>>>>>>   pacemaker: active/disabled
>>>>>>>>>   pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Here corosync and pacemaker shows
'disabled' state. Can you check the
>>>>>>>> status of their services. They should be
running prior to cluster
>>>>>>>> creation. We need to include that step in
document as well.
>>>>>>>
>>>>>>> Ah, OK, you?re right, I have added it to my puppet
modules (we install
>>>>>>> and configure ganesha via puppet, I?ll put the
module on puppetforge
>>>>>>> soon, in case anyone is interested).
>>>>>>>
>>>>>>>>
>>>>>>>>> But the issue that is puzzling me more is
the following:
>>>>>>>>>
>>>>>>>>> # showmount -e localhost
>>>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>>>
>>>>>>>>> And when I try to enable the ganesha
exports on a volume I get this
>>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> # gluster volume set atlas-home-01
ganesha.enable on
>>>>>>>>> volume set: failed: Failed to create
NFS-Ganesha export config file.
>>>>>>>>>
>>>>>>>>> But I see the file created in
/etc/ganesha/exports/*.conf
>>>>>>>>> Still, showmount hangs and times out.
>>>>>>>>> Any help?
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>> Hmm that's strange. Sometimes, in case if
there was no proper cleanup
>>>>>>>> done while trying to re-create the cluster, we
have seen such issues.
>>>>>>>>
>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>>>
>>>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>>>
>>>>>>>> Can you please unexport all the volumes,
teardown the cluster using
>>>>>>>> 'gluster vol set <volname>
ganesha.enable off?
>>>>>>>
>>>>>>> OK:
>>>>>>>
>>>>>>> # gluster vol set atlas-home-01 ganesha.enable off
>>>>>>> volume set: failed: ganesha.enable is already
'off'.
>>>>>>>
>>>>>>> # gluster vol set atlas-data-01 ganesha.enable off
>>>>>>> volume set: failed: ganesha.enable is already
'off'.
>>>>>>>
>>>>>>>
>>>>>>>> 'gluster ganesha disable' command.
>>>>>>>
>>>>>>> I?m assuming you wanted to write nfs-ganesha
instead?
>>>>>>>
>>>>>>> # gluster nfs-ganesha disable
>>>>>>> ganesha enable : success
>>>>>>>
>>>>>>>
>>>>>>> A side note (not really important): it?s strange
that when I do a
>>>>>>> disable the message is ?ganesha enable? :-)
>>>>>>>
>>>>>>>>
>>>>>>>> Verify if the following files have been deleted
on all the nodes-
>>>>>>>> '/etc/cluster/cluster.conf?
>>>>>>>
>>>>>>> this file is not present at all, I think it?s not
needed in CentOS 7
>>>>>>>
>>>>>>>> '/etc/ganesha/ganesha.conf?,
>>>>>>>
>>>>>>> it?s still there, but empty, and I guess it should
be OK, right?
>>>>>>>
>>>>>>>> '/etc/ganesha/exports/*?
>>>>>>>
>>>>>>> no more files there
>>>>>>>
>>>>>>>> '/var/lib/pacemaker/cib?
>>>>>>>
>>>>>>> it?s empty
>>>>>>>
>>>>>>>>
>>>>>>>> Verify if the ganesha service is stopped on all
the nodes.
>>>>>>>
>>>>>>> nope, it?s still running, I will stop it.
>>>>>>>
>>>>>>>>
>>>>>>>> start/restart the services - corosync, pcs.
>>>>>>>
>>>>>>> In the node where I issued the nfs-ganesha disable
there is no more
>>>>>>> any /etc/corosync/corosync.conf so corosync won?t
start. The other
>>>>>>> node instead still has the file, it?s strange.
>>>>>>>
>>>>>>>>
>>>>>>>> And re-try the HA cluster creation
>>>>>>>> 'gluster ganesha enable?
>>>>>>>
>>>>>>> This time (repeated twice) it did not work at all:
>>>>>>>
>>>>>>> # pcs status
>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>> Last updated: Tue Jun  9 10:13:43 2015
>>>>>>> Last change: Tue Jun  9 10:13:22 2015
>>>>>>> Stack: corosync
>>>>>>> Current DC: atlas-node1 (1) - partition with quorum
>>>>>>> Version: 1.1.12-a14efad
>>>>>>> 2 Nodes configured
>>>>>>> 6 Resources configured
>>>>>>>
>>>>>>>
>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>
>>>>>>> Full list of resources:
>>>>>>>
>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>> atlas-node2-dead_ip-1     (ocf::heartbeat:Dummy):
Started atlas-node1
>>>>>>> atlas-node1-dead_ip-1     (ocf::heartbeat:Dummy):
Started atlas-node2
>>>>>>>
>>>>>>> PCSD Status:
>>>>>>>   atlas-node1: Online
>>>>>>>   atlas-node2: Online
>>>>>>>
>>>>>>> Daemon Status:
>>>>>>>   corosync: active/enabled
>>>>>>>   pacemaker: active/enabled
>>>>>>>   pcsd: active/enabled
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I tried then "pcs cluster destroy" on
both nodes, and then again
>>>>>>> nfs-ganesha enable, but now I?m back to the old
problem:
>>>>>>>
>>>>>>> # pcs status
>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>> Last updated: Tue Jun  9 10:22:27 2015
>>>>>>> Last change: Tue Jun  9 10:17:00 2015
>>>>>>> Stack: corosync
>>>>>>> Current DC: atlas-node2 (2) - partition with quorum
>>>>>>> Version: 1.1.12-a14efad
>>>>>>> 2 Nodes configured
>>>>>>> 10 Resources configured
>>>>>>>
>>>>>>>
>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>
>>>>>>> Full list of resources:
>>>>>>>
>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>      Started: [ atlas-node1 atlas-node2 ]
>>>>>>> atlas-node1-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>> atlas-node1-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>> atlas-node2-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>> atlas-node2-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>> atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy):
Started atlas-node1
>>>>>>> atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy):
Started atlas-node2
>>>>>>>
>>>>>>> PCSD Status:
>>>>>>>   atlas-node1: Online
>>>>>>>   atlas-node2: Online
>>>>>>>
>>>>>>> Daemon Status:
>>>>>>>   corosync: active/enabled
>>>>>>>   pacemaker: active/enabled
>>>>>>>   pcsd: active/enabled
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Alessandro
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Soumya
>>>>>>>>
>>>>>>>>> Alessandro
>>>>>>>>>
>>>>>>>>>> Il giorno 08/giu/2015, alle ore 20:00,
Alessandro De Salvo
>>>>>>>>>> <Alessandro.DeSalvo at roma1.infn.it
>>>>>>>>>> <mailto:Alessandro.DeSalvo at
roma1.infn.it>> ha scritto:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> indeed, it does not work :-)
>>>>>>>>>> OK, this is what I did, with 2
machines, running CentOS 7.1,
>>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha 2.2.0:
>>>>>>>>>>
>>>>>>>>>> 1) ensured that the machines are able
to resolve their IPs (but
>>>>>>>>>> this was already true since they were
in the DNS);
>>>>>>>>>> 2) disabled NetworkManager and enabled
network on both machines;
>>>>>>>>>> 3) created a gluster shared volume
'gluster_shared_storage' and
>>>>>>>>>> mounted it on
'/run/gluster/shared_storage' on all the cluster
>>>>>>>>>> nodes using glusterfs native mount (on
CentOS 7.1 there is a link
>>>>>>>>>> by default /var/run -> ../run)
>>>>>>>>>> 4) created an empty
/etc/ganesha/ganesha.conf;
>>>>>>>>>> 5) installed pacemaker pcs
resource-agents corosync on all cluster
>>>>>>>>>> machines;
>>>>>>>>>> 6) set the ?hacluster? user the same
password on all machines;
>>>>>>>>>> 7) pcs cluster auth <hostname> -u
hacluster -p <pass> on all the
>>>>>>>>>> nodes (on both nodes I issued the
commands for both nodes)
>>>>>>>>>> 8) IPv6 is configured by default on all
nodes, although the
>>>>>>>>>> infrastructure is not ready for IPv6
>>>>>>>>>> 9) enabled pcsd and started it on all
nodes
>>>>>>>>>> 10) populated
/etc/ganesha/ganesha-ha.conf with the following
>>>>>>>>>> contents, one per machine:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ===> atlas-node1
>>>>>>>>>> # Name of the HA cluster created.
>>>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>> # The server from which you intend to
mount
>>>>>>>>>> # the shared volume.
>>>>>>>>>> HA_VOL_SERVER=?atlas-node1"
>>>>>>>>>> # The subset of nodes of the Gluster
Trusted Pool
>>>>>>>>>> # that forms the ganesha HA cluster.
IP/Hostname
>>>>>>>>>> # is specified.
>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>> # Virtual IPs of each of the nodes
specified above.
>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>> VIP_atlas-node2=?x.x.x.2"
>>>>>>>>>>
>>>>>>>>>> ===> atlas-node2
>>>>>>>>>> # Name of the HA cluster created.
>>>>>>>>>> HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>> # The server from which you intend to
mount
>>>>>>>>>> # the shared volume.
>>>>>>>>>> HA_VOL_SERVER=?atlas-node2"
>>>>>>>>>> # The subset of nodes of the Gluster
Trusted Pool
>>>>>>>>>> # that forms the ganesha HA cluster.
IP/Hostname
>>>>>>>>>> # is specified.
>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>> # Virtual IPs of each of the nodes
specified above.
>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>> VIP_atlas-node2=?x.x.x.2?
>>>>>>>>>>
>>>>>>>>>> 11) issued gluster nfs-ganesha enable,
but it fails with a cryptic
>>>>>>>>>> message:
>>>>>>>>>>
>>>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>>>> Enabling NFS-Ganesha requires
Gluster-NFS to be disabled across the
>>>>>>>>>> trusted pool. Do you still want to
continue? (y/n) y
>>>>>>>>>> nfs-ganesha: failed: Failed to set up
HA config for NFS-Ganesha.
>>>>>>>>>> Please check the log file for details
>>>>>>>>>>
>>>>>>>>>> Looking at the logs I found nothing
really special but this:
>>>>>>>>>>
>>>>>>>>>> ==>
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log
<=>>>>>>>>>> [2015-06-08 17:57:15.672844] I
[MSGID: 106132]
>>>>>>>>>>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>>>> already stopped
>>>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>>>>
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>>>>
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>>>> Initial NFS-Ganesha set up failed
>>>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>>>>
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>>>>>> operation 'Volume (null)'
failed on localhost : Failed to set up HA
>>>>>>>>>> config for NFS-Ganesha. Please check
the log file for details
>>>>>>>>>>
>>>>>>>>>> ==>
/var/log/glusterfs/cmd_history.log <=>>>>>>>>>>
[2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED :
>>>>>>>>>> Failed to set up HA config for
NFS-Ganesha. Please check the log
>>>>>>>>>> file for details
>>>>>>>>>>
>>>>>>>>>> ==> /var/log/glusterfs/cli.log
<=>>>>>>>>>> [2015-06-08 17:57:16.643839] I
[input.c:36:cli_batch] 0-: Exiting
>>>>>>>>>> with: -1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, pcs seems to be fine for the auth
part, although it obviously
>>>>>>>>>> tells me the cluster is not running.
>>>>>>>>>>
>>>>>>>>>> I, [2015-06-08T19:57:16.305323 #7223] 
INFO -- : Running:
>>>>>>>>>> /usr/sbin/corosync-cmapctl
totem.cluster_name
>>>>>>>>>> I, [2015-06-08T19:57:16.345457 #7223] 
INFO -- : Running:
>>>>>>>>>> /usr/sbin/pcs cluster token-nodes
>>>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015
19:57:16] "GET
>>>>>>>>>> /remote/check_auth HTTP/1.1" 200
68 0.1919
>>>>>>>>>> ::ffff:141.108.38.46 - - [08/Jun/2015
19:57:16] "GET
>>>>>>>>>> /remote/check_auth HTTP/1.1" 200
68 0.1920
>>>>>>>>>> atlas-node1.mydomain - -
[08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>>>> /remote/check_auth HTTP/1.1" 200
68
>>>>>>>>>> - -> /remote/check_auth
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What am I doing wrong?
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Alessandro
>>>>>>>>>>
>>>>>>>>>>> Il giorno 08/giu/2015, alle ore
19:30, Soumya Koduri
>>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 06/08/2015 08:20 PM, Alessandro
De Salvo wrote:
>>>>>>>>>>>> Sorry, just another question:
>>>>>>>>>>>>
>>>>>>>>>>>> - in my installation of gluster
3.7.1 the command gluster
>>>>>>>>>>>> features.ganesha enable does
not work:
>>>>>>>>>>>>
>>>>>>>>>>>> # gluster features.ganesha
enable
>>>>>>>>>>>> unrecognized word:
features.ganesha (position 0)
>>>>>>>>>>>>
>>>>>>>>>>>> Which version has full support
for it?
>>>>>>>>>>>
>>>>>>>>>>> Sorry. This option has recently
been changed. It is now
>>>>>>>>>>>
>>>>>>>>>>> $ gluster nfs-ganesha enable
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> - in the documentation the ccs
and cman packages are required,
>>>>>>>>>>>> but they seems not to be
available anymore on CentOS 7 and
>>>>>>>>>>>> similar, I guess they are not
really required anymore, as pcs
>>>>>>>>>>>> should do the full job
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Alessandro
>>>>>>>>>>>
>>>>>>>>>>> Looks like so from
http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>>>> Let us know if it doesn't work.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Soumya
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Il giorno 08/giu/2015, alle
ore 15:09, Alessandro De Salvo
>>>>>>>>>>>>> <alessandro.desalvo at
roma1.infn.it
>>>>>>>>>>>>>
<mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Great, many thanks Soumya!
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Il giorno 08/giu/2015,
alle ore 13:53, Soumya Koduri
>>>>>>>>>>>>>> <skoduri at
redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please find the slides
of the demo video at [1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We recommend to have a
distributed replica volume as a shared
>>>>>>>>>>>>>> volume for better
data-availability.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Size of the volume
depends on the workload you may have. Since
>>>>>>>>>>>>>> it is used to maintain
states of NLM/NFSv4 clients, you may
>>>>>>>>>>>>>> calculate the size of
the volume to be minimum of aggregate of
>>>>>>>>>>>>>>
(typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>>>>
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We shall document about
this feature sooner in the gluster docs
>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] -
http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 06/08/2015 04:34 PM,
Alessandro De Salvo wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> I have seen the
demo video on ganesha HA,
>>>>>>>>>>>>>>>
https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>>>> However there is no
advice on the appropriate size of the
>>>>>>>>>>>>>>> shared volume. How
is it really used, and what should be a
>>>>>>>>>>>>>>> reasonable size for
it?
>>>>>>>>>>>>>>> Also, are the
slides from the video available somewhere, as
>>>>>>>>>>>>>>> well as a
documentation on all this? I did not manage to find
>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> Gluster-users
mailing list
>>>>>>>>>>>>>>> Gluster-users at
gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>

Gluster users - Jun 2015 - Questions on ganesha HA and shared storage size

[Gluster-users] Questions on ganesha HA and shared storage size

[Gluster-users] Questions on ganesha HA and shared storage size