thr3ads.net - Gluster users - [Gluster-users] Questions on ganesha HA and shared storage size [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Alessandro De Salvo

2015-Jun-10 19:07 UTC

[Gluster-users] Questions on ganesha HA and shared storage size

Hi,
by looking at the connections I also see a strange problem:

# netstat -ltaupn | grep 2049
tcp6       4      0 :::2049                 :::*
LISTEN      32080/ganesha.nfsd  
tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
-                   
tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
CLOSE_WAIT  -                   
udp6       0      0 :::2049                 :::*
32080/ganesha.nfsd  


Why tcp6 is used with an IPv4 address?
In another machine where ganesha 2.1.0 is running I see tcp is used, not
tcp6.
Could it be that the RPC are always trying to use IPv6? That would be
wrong.
Thanks,

	Alessandro

On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:> 
> On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
> > Hi,
> > I have enabled the full debug already, but I see nothing special.
Before exporting any volume the log shows no error, even when I do a showmount
(the log is attached, ganesha.log.gz). If I do the same after exporting a volume
nfs-ganesha does not even start, complaining for not being able to bind the IPv6
ruota socket, but in fact there is nothing listening on IPv6, so it should not
happen:
> >
> > tcp6       0      0 :::111                  :::*                   
LISTEN      7433/rpcbind
> > tcp6       0      0 :::2224                 :::*                   
LISTEN      9054/ruby
> > tcp6       0      0 :::22                   :::*                   
LISTEN      1248/sshd
> > udp6       0      0 :::111                  :::*                      
7433/rpcbind
> > udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                      
31238/ntpd
> > udp6       0      0 fe80::230:48ff:fed2:123 :::*                      
31238/ntpd
> > udp6       0      0 fe80::230:48ff:fed2:123 :::*                      
31238/ntpd
> > udp6       0      0 fe80::230:48ff:fed2:123 :::*                      
31238/ntpd
> > udp6       0      0 ::1:123                 :::*                      
31238/ntpd
> > udp6       0      0 fe80::5484:7aff:fef:123 :::*                      
31238/ntpd
> > udp6       0      0 :::123                  :::*                      
31238/ntpd
> > udp6       0      0 :::824                  :::*                      
7433/rpcbind
> >
> > The error, as shown in the attached ganesha-after-export.log.gz
logfile, is the following:
> >
> >
> > 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6
socket, error 98 (Address already in use)
> > 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
> > 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
> >
> 
> We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
> disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
> there could be delay or issue with Gluster-NFS un-registering those 
> services and when NFS-Ganesha tries to register to the same port, it 
> throws this error. Please try registering Rquota to any random port 
> using below config option in "/etc/ganesha/ganesha.conf"
> 
> NFS_Core_Param {
>          #Use a non-privileged port for RQuota
>          Rquota_Port = 4501;
> }
> 
> and cleanup '/var/cache/rpcbind/' directory before the setup.
> 
> Thanks,
> Soumya
> 
> >
> > Thanks,
> >
> > 	Alessandro
> >
> >
> >
> >
> >> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri <skoduri
at redhat.com> ha scritto:
> >>
> >>
> >>
> >> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
> >>> Another update: the fact that I was unable to use vol set
ganesha.enable
> >>> was due to another bug in the ganesha scripts. In short, they
are all
> >>> using the following line to get the location of the conf file:
> >>>
> >>> CONF=$(cat /etc/sysconfig/ganesha | grep "CONFFILE"
| cut -f 2 -d "=")
> >>>
> >>> First of all by default in /etc/sysconfig/ganesha there is no
line
> >>> CONFFILE, second there is a bug in that directive, as it works
if I add
> >>> in /etc/sysconfig/ganesha
> >>>
> >>> CONFFILE=/etc/ganesha/ganesha.conf
> >>>
> >>> but it fails if the same is quoted
> >>>
> >>> CONFFILE="/etc/ganesha/ganesha.conf"
> >>>
> >>> It would be much better to use the following, which has a
default as
> >>> well:
> >>>
> >>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
> >>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
> >>>
> >>> I'll update the bug report.
> >>> Having said this... the last issue to tackle is the real
problem with
> >>> the ganesha.nfsd :-(
> >>
> >> Thanks. Could you try changing log level to NIV_FULL_DEBUG in
'/etc/sysconfig/ganesha' and check if anything gets logged in
'/var/log/ganesha.log' or '/ganesha.log'.
> >>
> >> Thanks,
> >> Soumya
> >>
> >>> Cheers,
> >>>
> >>> 	Alessandro
> >>>
> >>>
> >>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
> >>>> OK, I can confirm that the ganesha.nsfd process is
actually not
> >>>> answering to the calls. Here it is what I see:
> >>>>
> >>>> # rpcinfo -p
> >>>>     program vers proto   port  service
> >>>>      100000    4   tcp    111  portmapper
> >>>>      100000    3   tcp    111  portmapper
> >>>>      100000    2   tcp    111  portmapper
> >>>>      100000    4   udp    111  portmapper
> >>>>      100000    3   udp    111  portmapper
> >>>>      100000    2   udp    111  portmapper
> >>>>      100024    1   udp  41594  status
> >>>>      100024    1   tcp  53631  status
> >>>>      100003    3   udp   2049  nfs
> >>>>      100003    3   tcp   2049  nfs
> >>>>      100003    4   udp   2049  nfs
> >>>>      100003    4   tcp   2049  nfs
> >>>>      100005    1   udp  58127  mountd
> >>>>      100005    1   tcp  56301  mountd
> >>>>      100005    3   udp  58127  mountd
> >>>>      100005    3   tcp  56301  mountd
> >>>>      100021    4   udp  46203  nlockmgr
> >>>>      100021    4   tcp  41798  nlockmgr
> >>>>      100011    1   udp    875  rquotad
> >>>>      100011    1   tcp    875  rquotad
> >>>>      100011    2   udp    875  rquotad
> >>>>      100011    2   tcp    875  rquotad
> >>>>
> >>>> # netstat -lpn | grep ganesha
> >>>> tcp6      14      0 :::2049                 :::*
> >>>> LISTEN      11937/ganesha.nfsd
> >>>> tcp6       0      0 :::41798                :::*
> >>>> LISTEN      11937/ganesha.nfsd
> >>>> tcp6       0      0 :::875                  :::*
> >>>> LISTEN      11937/ganesha.nfsd
> >>>> tcp6      10      0 :::56301                :::*
> >>>> LISTEN      11937/ganesha.nfsd
> >>>> tcp6       0      0 :::564                  :::*
> >>>> LISTEN      11937/ganesha.nfsd
> >>>> udp6       0      0 :::2049                 :::*
> >>>> 11937/ganesha.nfsd
> >>>> udp6       0      0 :::46203                :::*
> >>>> 11937/ganesha.nfsd
> >>>> udp6       0      0 :::58127                :::*
> >>>> 11937/ganesha.nfsd
> >>>> udp6       0      0 :::875                  :::*
> >>>> 11937/ganesha.nfsd
> >>>>
> >>>> I'm attaching the strace of a showmount from a node to
the other.
> >>>> This machinery was working with nfs-ganesha 2.1.0, so it
must be
> >>>> something introduced with 2.2.0.
> >>>> Cheers,
> >>>>
> >>>> 	Alessandro
> >>>>
> >>>>
> >>>>
> >>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
> >>>>>
> >>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
> >>>>>> Hi,
> >>>>>> OK, the problem with the VIPs not starting is due
to the ganesha_mon
> >>>>>> heartbeat script looking for a pid file called
> >>>>>> /var/run/ganesha.nfsd.pid, while by default
ganesha.nfsd v.2.2.0 is
> >>>>>> creating /var/run/ganesha.pid, this needs to be
corrected. The file is
> >>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my
case.
> >>>>>> For the moment I have created a symlink in this
way and it works:
> >>>>>>
> >>>>>> ln -s /var/run/ganesha.pid
/var/run/ganesha.nfsd.pid
> >>>>>>
> >>>>> Thanks. Please update this as well in the bug.
> >>>>>
> >>>>>> So far so good, the VIPs are up and pingable, but
still there is the
> >>>>>> problem of the hanging showmount (i.e. hanging
RPC).
> >>>>>> Still, I see a lot of errors like this in
/var/log/messages:
> >>>>>>
> >>>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice:
operation_finished:
> >>>>>> nfs-mon_monitor_10000:29292:stderr [ Error:
Resource does not exist. ]
> >>>>>>
> >>>>>> While ganesha.log shows the server is not in
grace:
> >>>>>>
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT
:ganesha.nfsd Starting:
> >>>>>> Ganesha Version
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
> >>>>>> May 18 2015 14:17:18 on
buildhw-09.phx2.fedoraproject.org
> >>>>>> <http://buildhw-09.phx2.fedoraproject.org>
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_set_param_from_conf
:NFS STARTUP :EVENT
> >>>>>> :Configuration file successfully parsed
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT
> >>>>>> :Initializing ID Mapper.
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT :ID Mapper
> >>>>>> successfully initialized.
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP :WARN
:No export entries
> >>>>>> found in configuration file !!!
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] config_errs_to_log
:CONFIG :WARN :Config File
> >>>>>> ((null):0): Empty configuration file
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT
> >>>>>> :CAP_SYS_RESOURCE was successfully removed for
proper quota management
> >>>>>> in FSAL
> >>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT :currenty set
> >>>>>> capabilities are: > >>>>>>
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
> >>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT
:Cannot acquire
> >>>>>> credentials for principal nfs
> >>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread
:NFS CB :EVENT :Admin
> >>>>>> thread initialized
> >>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs4_start_grace :STATE
:EVENT :NFS Server Now
> >>>>>> IN GRACE, duration 60
> >>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :EVENT
> >>>>>> :Callback creds directory (/var/run/ganesha)
already exists
> >>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :WARN
> >>>>>> :gssd_refresh_krb5_machine_credential failed (2:2)
> >>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :Starting
> >>>>>> delayed executor.
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :9P/TCP
> >>>>>> dispatcher thread was started successfully
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread
:9P DISP :EVENT :9P
> >>>>>> dispatcher started
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT
> >>>>>> :gsh_dbusthread was started successfully
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :admin thread
> >>>>>> was started successfully
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :reaper thread
> >>>>>> was started successfully
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now IN
> >>>>>> GRACE
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_Start_threads :THREAD
:EVENT :General
> >>>>>> fridge was started successfully
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
> >>>>>> :-------------------------------------------------
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT :             NFS
> >>>>>> SERVER INITIALIZED
> >>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
> >>>>>> :-------------------------------------------------
> >>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 : atlas-node1
:
> >>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now
> >>>>>> NOT IN GRACE
> >>>>>>
> >>>>>>
> >>>>> Please check the status of nfs-ganesha
> >>>>> $service nfs-ganesha status
> >>>>>
> >>>>> Could you try taking a packet trace (during showmount
or mount) and
> >>>>> check the server responses.
> >>>>>
> >>>>> Thanks,
> >>>>> Soumya
> >>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Alessandro
> >>>>>>
> >>>>>>
> >>>>>>> Il giorno 09/giu/2015, alle ore 10:36,
Alessandro De Salvo
> >>>>>>> <alessandro.desalvo at roma1.infn.it
> >>>>>>> <mailto:alessandro.desalvo at
roma1.infn.it>> ha scritto:
> >>>>>>>
> >>>>>>> Hi Soumya,
> >>>>>>>
> >>>>>>>> Il giorno 09/giu/2015, alle ore 08:06,
Soumya Koduri
> >>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 06/09/2015 01:31 AM, Alessandro De
Salvo wrote:
> >>>>>>>>> OK, I found at least one of the bugs.
> >>>>>>>>> The /usr/libexec/ganesha/ganesha.sh
has the following lines:
> >>>>>>>>>
> >>>>>>>>>     if [ -e /etc/os-release ]; then
> >>>>>>>>>        
RHEL6_PCS_CNAME_OPTION=""
> >>>>>>>>>     fi
> >>>>>>>>>
> >>>>>>>>> This is OK for RHEL < 7, but does
not work for >= 7. I have changed
> >>>>>>>>> it to the following, to make it
working:
> >>>>>>>>>
> >>>>>>>>>     if [ -e /etc/os-release ]; then
> >>>>>>>>>         eval $(grep -F
"REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
> >>>>>>>>>         [
"$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
> >>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
> >>>>>>>>>     fi
> >>>>>>>>>
> >>>>>>>> Oh..Thanks for the fix. Could you please
file a bug for the same (and
> >>>>>>>> probably submit your fix as well). We
shall have it corrected.
> >>>>>>>
> >>>>>>> Just did
it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> Apart from that, the VIP_<node>
I was using were wrong, and I should
> >>>>>>>>> have converted all the ?-? to
underscores, maybe this could be
> >>>>>>>>> mentioned in the documentation when
you will have it ready.
> >>>>>>>>> Now, the cluster starts, but the VIPs
apparently not:
> >>>>>>>>>
> >>>>>>>> Sure. Thanks again for pointing it out. We
shall make a note of it.
> >>>>>>>>
> >>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
> >>>>>>>>>
> >>>>>>>>> Full list of resources:
> >>>>>>>>>
> >>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
> >>>>>>>>>      Started: [ atlas-node1
atlas-node2 ]
> >>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
> >>>>>>>>>      Started: [ atlas-node1
atlas-node2 ]
> >>>>>>>>> atlas-node1-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
> >>>>>>>>> atlas-node1-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
> >>>>>>>>> atlas-node2-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
> >>>>>>>>> atlas-node2-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
> >>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
> >>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
> >>>>>>>>>
> >>>>>>>>> PCSD Status:
> >>>>>>>>>   atlas-node1: Online
> >>>>>>>>>   atlas-node2: Online
> >>>>>>>>>
> >>>>>>>>> Daemon Status:
> >>>>>>>>>   corosync: active/disabled
> >>>>>>>>>   pacemaker: active/disabled
> >>>>>>>>>   pcsd: active/enabled
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Here corosync and pacemaker shows
'disabled' state. Can you check the
> >>>>>>>> status of their services. They should be
running prior to cluster
> >>>>>>>> creation. We need to include that step in
document as well.
> >>>>>>>
> >>>>>>> Ah, OK, you?re right, I have added it to my
puppet modules (we install
> >>>>>>> and configure ganesha via puppet, I?ll put the
module on puppetforge
> >>>>>>> soon, in case anyone is interested).
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> But the issue that is puzzling me more
is the following:
> >>>>>>>>>
> >>>>>>>>> # showmount -e localhost
> >>>>>>>>> rpc mount export: RPC: Timed out
> >>>>>>>>>
> >>>>>>>>> And when I try to enable the ganesha
exports on a volume I get this
> >>>>>>>>> error:
> >>>>>>>>>
> >>>>>>>>> # gluster volume set atlas-home-01
ganesha.enable on
> >>>>>>>>> volume set: failed: Failed to create
NFS-Ganesha export config file.
> >>>>>>>>>
> >>>>>>>>> But I see the file created in
/etc/ganesha/exports/*.conf
> >>>>>>>>> Still, showmount hangs and times out.
> >>>>>>>>> Any help?
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>> Hmm that's strange. Sometimes, in case
if there was no proper cleanup
> >>>>>>>> done while trying to re-create the
cluster, we have seen such issues.
> >>>>>>>>
> >>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1227709
> >>>>>>>>
> >>>>>>>> http://review.gluster.org/#/c/11093/
> >>>>>>>>
> >>>>>>>> Can you please unexport all the volumes,
teardown the cluster using
> >>>>>>>> 'gluster vol set <volname>
ganesha.enable off?
> >>>>>>>
> >>>>>>> OK:
> >>>>>>>
> >>>>>>> # gluster vol set atlas-home-01 ganesha.enable
off
> >>>>>>> volume set: failed: ganesha.enable is already
'off'.
> >>>>>>>
> >>>>>>> # gluster vol set atlas-data-01 ganesha.enable
off
> >>>>>>> volume set: failed: ganesha.enable is already
'off'.
> >>>>>>>
> >>>>>>>
> >>>>>>>> 'gluster ganesha disable' command.
> >>>>>>>
> >>>>>>> I?m assuming you wanted to write nfs-ganesha
instead?
> >>>>>>>
> >>>>>>> # gluster nfs-ganesha disable
> >>>>>>> ganesha enable : success
> >>>>>>>
> >>>>>>>
> >>>>>>> A side note (not really important): it?s
strange that when I do a
> >>>>>>> disable the message is ?ganesha enable? :-)
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Verify if the following files have been
deleted on all the nodes-
> >>>>>>>> '/etc/cluster/cluster.conf?
> >>>>>>>
> >>>>>>> this file is not present at all, I think it?s
not needed in CentOS 7
> >>>>>>>
> >>>>>>>> '/etc/ganesha/ganesha.conf?,
> >>>>>>>
> >>>>>>> it?s still there, but empty, and I guess it
should be OK, right?
> >>>>>>>
> >>>>>>>> '/etc/ganesha/exports/*?
> >>>>>>>
> >>>>>>> no more files there
> >>>>>>>
> >>>>>>>> '/var/lib/pacemaker/cib?
> >>>>>>>
> >>>>>>> it?s empty
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Verify if the ganesha service is stopped
on all the nodes.
> >>>>>>>
> >>>>>>> nope, it?s still running, I will stop it.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> start/restart the services - corosync,
pcs.
> >>>>>>>
> >>>>>>> In the node where I issued the nfs-ganesha
disable there is no more
> >>>>>>> any /etc/corosync/corosync.conf so corosync
won?t start. The other
> >>>>>>> node instead still has the file, it?s strange.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> And re-try the HA cluster creation
> >>>>>>>> 'gluster ganesha enable?
> >>>>>>>
> >>>>>>> This time (repeated twice) it did not work at
all:
> >>>>>>>
> >>>>>>> # pcs status
> >>>>>>> Cluster name: ATLAS_GANESHA_01
> >>>>>>> Last updated: Tue Jun  9 10:13:43 2015
> >>>>>>> Last change: Tue Jun  9 10:13:22 2015
> >>>>>>> Stack: corosync
> >>>>>>> Current DC: atlas-node1 (1) - partition with
quorum
> >>>>>>> Version: 1.1.12-a14efad
> >>>>>>> 2 Nodes configured
> >>>>>>> 6 Resources configured
> >>>>>>>
> >>>>>>>
> >>>>>>> Online: [ atlas-node1 atlas-node2 ]
> >>>>>>>
> >>>>>>> Full list of resources:
> >>>>>>>
> >>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
> >>>>>>>      Started: [ atlas-node1 atlas-node2 ]
> >>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
> >>>>>>>      Started: [ atlas-node1 atlas-node2 ]
> >>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
> >>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
> >>>>>>>
> >>>>>>> PCSD Status:
> >>>>>>>   atlas-node1: Online
> >>>>>>>   atlas-node2: Online
> >>>>>>>
> >>>>>>> Daemon Status:
> >>>>>>>   corosync: active/enabled
> >>>>>>>   pacemaker: active/enabled
> >>>>>>>   pcsd: active/enabled
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I tried then "pcs cluster destroy"
on both nodes, and then again
> >>>>>>> nfs-ganesha enable, but now I?m back to the
old problem:
> >>>>>>>
> >>>>>>> # pcs status
> >>>>>>> Cluster name: ATLAS_GANESHA_01
> >>>>>>> Last updated: Tue Jun  9 10:22:27 2015
> >>>>>>> Last change: Tue Jun  9 10:17:00 2015
> >>>>>>> Stack: corosync
> >>>>>>> Current DC: atlas-node2 (2) - partition with
quorum
> >>>>>>> Version: 1.1.12-a14efad
> >>>>>>> 2 Nodes configured
> >>>>>>> 10 Resources configured
> >>>>>>>
> >>>>>>>
> >>>>>>> Online: [ atlas-node1 atlas-node2 ]
> >>>>>>>
> >>>>>>> Full list of resources:
> >>>>>>>
> >>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
> >>>>>>>      Started: [ atlas-node1 atlas-node2 ]
> >>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
> >>>>>>>      Started: [ atlas-node1 atlas-node2 ]
> >>>>>>> atlas-node1-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
> >>>>>>> atlas-node1-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node1
> >>>>>>> atlas-node2-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
> >>>>>>> atlas-node2-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node2
> >>>>>>> atlas-node1-dead_ip-1  (ocf::heartbeat:Dummy):
Started atlas-node1
> >>>>>>> atlas-node2-dead_ip-1  (ocf::heartbeat:Dummy):
Started atlas-node2
> >>>>>>>
> >>>>>>> PCSD Status:
> >>>>>>>   atlas-node1: Online
> >>>>>>>   atlas-node2: Online
> >>>>>>>
> >>>>>>> Daemon Status:
> >>>>>>>   corosync: active/enabled
> >>>>>>>   pacemaker: active/enabled
> >>>>>>>   pcsd: active/enabled
> >>>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> Alessandro
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Soumya
> >>>>>>>>
> >>>>>>>>> Alessandro
> >>>>>>>>>
> >>>>>>>>>> Il giorno 08/giu/2015, alle ore
20:00, Alessandro De Salvo
> >>>>>>>>>> <Alessandro.DeSalvo at
roma1.infn.it
> >>>>>>>>>> <mailto:Alessandro.DeSalvo at
roma1.infn.it>> ha scritto:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> indeed, it does not work :-)
> >>>>>>>>>> OK, this is what I did, with 2
machines, running CentOS 7.1,
> >>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha
2.2.0:
> >>>>>>>>>>
> >>>>>>>>>> 1) ensured that the machines are
able to resolve their IPs (but
> >>>>>>>>>> this was already true since they
were in the DNS);
> >>>>>>>>>> 2) disabled NetworkManager and
enabled network on both machines;
> >>>>>>>>>> 3) created a gluster shared volume
'gluster_shared_storage' and
> >>>>>>>>>> mounted it on
'/run/gluster/shared_storage' on all the cluster
> >>>>>>>>>> nodes using glusterfs native mount
(on CentOS 7.1 there is a link
> >>>>>>>>>> by default /var/run -> ../run)
> >>>>>>>>>> 4) created an empty
/etc/ganesha/ganesha.conf;
> >>>>>>>>>> 5) installed pacemaker pcs
resource-agents corosync on all cluster
> >>>>>>>>>> machines;
> >>>>>>>>>> 6) set the ?hacluster? user the
same password on all machines;
> >>>>>>>>>> 7) pcs cluster auth
<hostname> -u hacluster -p <pass> on all the
> >>>>>>>>>> nodes (on both nodes I issued the
commands for both nodes)
> >>>>>>>>>> 8) IPv6 is configured by default
on all nodes, although the
> >>>>>>>>>> infrastructure is not ready for
IPv6
> >>>>>>>>>> 9) enabled pcsd and started it on
all nodes
> >>>>>>>>>> 10) populated
/etc/ganesha/ganesha-ha.conf with the following
> >>>>>>>>>> contents, one per machine:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ===> atlas-node1
> >>>>>>>>>> # Name of the HA cluster created.
> >>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
> >>>>>>>>>> # The server from which you intend
to mount
> >>>>>>>>>> # the shared volume.
> >>>>>>>>>> HA_VOL_SERVER=?atlas-node1"
> >>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
> >>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
> >>>>>>>>>> # is specified.
> >>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
> >>>>>>>>>> # Virtual IPs of each of the nodes
specified above.
> >>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
> >>>>>>>>>> VIP_atlas-node2=?x.x.x.2"
> >>>>>>>>>>
> >>>>>>>>>> ===> atlas-node2
> >>>>>>>>>> # Name of the HA cluster created.
> >>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
> >>>>>>>>>> # The server from which you intend
to mount
> >>>>>>>>>> # the shared volume.
> >>>>>>>>>> HA_VOL_SERVER=?atlas-node2"
> >>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
> >>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
> >>>>>>>>>> # is specified.
> >>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
> >>>>>>>>>> # Virtual IPs of each of the nodes
specified above.
> >>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
> >>>>>>>>>> VIP_atlas-node2=?x.x.x.2?
> >>>>>>>>>>
> >>>>>>>>>> 11) issued gluster nfs-ganesha
enable, but it fails with a cryptic
> >>>>>>>>>> message:
> >>>>>>>>>>
> >>>>>>>>>> # gluster nfs-ganesha enable
> >>>>>>>>>> Enabling NFS-Ganesha requires
Gluster-NFS to be disabled across the
> >>>>>>>>>> trusted pool. Do you still want to
continue? (y/n) y
> >>>>>>>>>> nfs-ganesha: failed: Failed to set
up HA config for NFS-Ganesha.
> >>>>>>>>>> Please check the log file for
details
> >>>>>>>>>>
> >>>>>>>>>> Looking at the logs I found
nothing really special but this:
> >>>>>>>>>>
> >>>>>>>>>> ==>
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log <=>
>>>>>>>>>> [2015-06-08 17:57:15.672844] I [MSGID:
106132]
> >>>>>>>>>>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
> >>>>>>>>>> already stopped
> >>>>>>>>>> [2015-06-08 17:57:15.675395] I
> >>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
> >>>>>>>>>> found Hostname is atlas-node2
> >>>>>>>>>> [2015-06-08 17:57:15.720692] I
> >>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
> >>>>>>>>>> found Hostname is atlas-node2
> >>>>>>>>>> [2015-06-08 17:57:15.721161] I
> >>>>>>>>>>
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
> >>>>>>>>>> found Hostname is atlas-node2
> >>>>>>>>>> [2015-06-08 17:57:16.633048] E
> >>>>>>>>>>
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
> >>>>>>>>>> Initial NFS-Ganesha set up failed
> >>>>>>>>>> [2015-06-08 17:57:16.641563] E
> >>>>>>>>>>
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
> >>>>>>>>>> operation 'Volume (null)'
failed on localhost : Failed to set up HA
> >>>>>>>>>> config for NFS-Ganesha. Please
check the log file for details
> >>>>>>>>>>
> >>>>>>>>>> ==>
/var/log/glusterfs/cmd_history.log <=>
>>>>>>>>>> [2015-06-08 17:57:16.643615]  :
nfs-ganesha enable : FAILED :
> >>>>>>>>>> Failed to set up HA config for
NFS-Ganesha. Please check the log
> >>>>>>>>>> file for details
> >>>>>>>>>>
> >>>>>>>>>> ==> /var/log/glusterfs/cli.log
<=> >>>>>>>>>> [2015-06-08 17:57:16.643839]
I [input.c:36:cli_batch] 0-: Exiting
> >>>>>>>>>> with: -1
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Also, pcs seems to be fine for the
auth part, although it obviously
> >>>>>>>>>> tells me the cluster is not
running.
> >>>>>>>>>>
> >>>>>>>>>> I, [2015-06-08T19:57:16.305323
#7223]  INFO -- : Running:
> >>>>>>>>>> /usr/sbin/corosync-cmapctl
totem.cluster_name
> >>>>>>>>>> I, [2015-06-08T19:57:16.345457
#7223]  INFO -- : Running:
> >>>>>>>>>> /usr/sbin/pcs cluster token-nodes
> >>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
> >>>>>>>>>> /remote/check_auth HTTP/1.1"
200 68 0.1919
> >>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
> >>>>>>>>>> /remote/check_auth HTTP/1.1"
200 68 0.1920
> >>>>>>>>>> atlas-node1.mydomain - -
[08/Jun/2015:19:57:16 CEST] "GET
> >>>>>>>>>> /remote/check_auth HTTP/1.1"
200 68
> >>>>>>>>>> - -> /remote/check_auth
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> What am I doing wrong?
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> Alessandro
> >>>>>>>>>>
> >>>>>>>>>>> Il giorno 08/giu/2015, alle
ore 19:30, Soumya Koduri
> >>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 06/08/2015 08:20 PM,
Alessandro De Salvo wrote:
> >>>>>>>>>>>> Sorry, just another
question:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - in my installation of
gluster 3.7.1 the command gluster
> >>>>>>>>>>>> features.ganesha enable
does not work:
> >>>>>>>>>>>>
> >>>>>>>>>>>> # gluster features.ganesha
enable
> >>>>>>>>>>>> unrecognized word:
features.ganesha (position 0)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Which version has full
support for it?
> >>>>>>>>>>>
> >>>>>>>>>>> Sorry. This option has
recently been changed. It is now
> >>>>>>>>>>>
> >>>>>>>>>>> $ gluster nfs-ganesha enable
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> - in the documentation the
ccs and cman packages are required,
> >>>>>>>>>>>> but they seems not to be
available anymore on CentOS 7 and
> >>>>>>>>>>>> similar, I guess they are
not really required anymore, as pcs
> >>>>>>>>>>>> should do the full job
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Alessandro
> >>>>>>>>>>>
> >>>>>>>>>>> Looks like so from
http://clusterlabs.org/quickstart-redhat.html.
> >>>>>>>>>>> Let us know if it doesn't
work.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Soumya
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Il giorno 08/giu/2015,
alle ore 15:09, Alessandro De Salvo
> >>>>>>>>>>>>> <alessandro.desalvo
at roma1.infn.it
> >>>>>>>>>>>>>
<mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Great, many thanks
Soumya!
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Alessandro
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 13:53, Soumya Koduri
> >>>>>>>>>>>>>> <skoduri at
redhat.com <mailto:skoduri at redhat.com>> ha scritto:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Please find the
slides of the demo video at [1]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We recommend to
have a distributed replica volume as a shared
> >>>>>>>>>>>>>> volume for better
data-availability.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Size of the volume
depends on the workload you may have. Since
> >>>>>>>>>>>>>> it is used to
maintain states of NLM/NFSv4 clients, you may
> >>>>>>>>>>>>>> calculate the size
of the volume to be minimum of aggregate of
> >>>>>>>>>>>>>>
(typical_size_of'/var/lib/nfs'_directory +
> >>>>>>>>>>>>>>
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We shall document
about this feature sooner in the gluster docs
> >>>>>>>>>>>>>> as well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Soumya
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [1] -
http://www.slideshare.net/SoumyaKoduri/high-49117846
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 06/08/2015
04:34 PM, Alessandro De Salvo wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>> I have seen
the demo video on ganesha HA,
> >>>>>>>>>>>>>>>
https://www.youtube.com/watch?v=Z4mvTQC-efM
> >>>>>>>>>>>>>>> However there
is no advice on the appropriate size of the
> >>>>>>>>>>>>>>> shared volume.
How is it really used, and what should be a
> >>>>>>>>>>>>>>> reasonable
size for it?
> >>>>>>>>>>>>>>> Also, are the
slides from the video available somewhere, as
> >>>>>>>>>>>>>>> well as a
documentation on all this? I did not manage to find
> >>>>>>>>>>>>>>> them.
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Alessandro
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
_______________________________________________
> >>>>>>>>>>>>>>> Gluster-users
mailing list
> >>>>>>>>>>>>>>> Gluster-users
at gluster.org <mailto:Gluster-users at gluster.org>
> >>>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
_______________________________________________
> >>>>>>>>>> Gluster-users mailing list
> >>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
> >>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
_______________________________________________
> >>>>>>> Gluster-users mailing list
> >>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
> >>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
> >>>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Gluster-users mailing list
> >>>> Gluster-users at gluster.org
> >>>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >

Alessandro De Salvo

2015-Jun-11 15:48 UTC

head link

[Gluster-users] Questions on ganesha HA and shared storage size

Soumya, do you have any other idea of what to check on my side?
Many thanks,

	Alessandro
> Il giorno 10/giu/2015, alle ore 21:07, Alessandro De Salvo
<alessandro.desalvo at roma1.infn.it> ha scritto:
> 
> Hi,
> by looking at the connections I also see a strange problem:
> 
> # netstat -ltaupn | grep 2049
> tcp6       4      0 :::2049                 :::*
> LISTEN      32080/ganesha.nfsd  
> tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
> -                   
> tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
> CLOSE_WAIT  -                   
> udp6       0      0 :::2049                 :::*
> 32080/ganesha.nfsd  
> 
> 
> Why tcp6 is used with an IPv4 address?
> In another machine where ganesha 2.1.0 is running I see tcp is used, not
> tcp6.
> Could it be that the RPC are always trying to use IPv6? That would be
> wrong.
> Thanks,
> 
> 	Alessandro
> 
> On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
>> 
>> On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
>>> Hi,
>>> I have enabled the full debug already, but I see nothing special.
Before exporting any volume the log shows no error, even when I do a showmount
(the log is attached, ganesha.log.gz). If I do the same after exporting a volume
nfs-ganesha does not even start, complaining for not being able to bind the IPv6
ruota socket, but in fact there is nothing listening on IPv6, so it should not
happen:
>>> 
>>> tcp6       0      0 :::111                  :::*                   
LISTEN      7433/rpcbind
>>> tcp6       0      0 :::2224                 :::*                   
LISTEN      9054/ruby
>>> tcp6       0      0 :::22                   :::*                   
LISTEN      1248/sshd
>>> udp6       0      0 :::111                  :::*                   
7433/rpcbind
>>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 ::1:123                 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::5484:7aff:fef:123 :::*                   
31238/ntpd
>>> udp6       0      0 :::123                  :::*                   
31238/ntpd
>>> udp6       0      0 :::824                  :::*                   
7433/rpcbind
>>> 
>>> The error, as shown in the attached ganesha-after-export.log.gz
logfile, is the following:
>>> 
>>> 
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6
socket, error 98 (Address already in use)
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>>> 
>> 
>> We have seen such issues with RPCBIND few times. NFS-Ganesha setup
first
>> disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes,
>> there could be delay or issue with Gluster-NFS un-registering those 
>> services and when NFS-Ganesha tries to register to the same port, it 
>> throws this error. Please try registering Rquota to any random port 
>> using below config option in "/etc/ganesha/ganesha.conf"
>> 
>> NFS_Core_Param {
>>         #Use a non-privileged port for RQuota
>>         Rquota_Port = 4501;
>> }
>> 
>> and cleanup '/var/cache/rpcbind/' directory before the setup.
>> 
>> Thanks,
>> Soumya
>> 
>>> 
>>> Thanks,
>>> 
>>> 	Alessandro
>>> 
>>> 
>>> 
>>> 
>>>> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri
<skoduri at redhat.com> ha scritto:
>>>> 
>>>> 
>>>> 
>>>> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
>>>>> Another update: the fact that I was unable to use vol set
ganesha.enable
>>>>> was due to another bug in the ganesha scripts. In short,
they are all
>>>>> using the following line to get the location of the conf
file:
>>>>> 
>>>>> CONF=$(cat /etc/sysconfig/ganesha | grep
"CONFFILE" | cut -f 2 -d "=")
>>>>> 
>>>>> First of all by default in /etc/sysconfig/ganesha there is
no line
>>>>> CONFFILE, second there is a bug in that directive, as it
works if I add
>>>>> in /etc/sysconfig/ganesha
>>>>> 
>>>>> CONFFILE=/etc/ganesha/ganesha.conf
>>>>> 
>>>>> but it fails if the same is quoted
>>>>> 
>>>>> CONFFILE="/etc/ganesha/ganesha.conf"
>>>>> 
>>>>> It would be much better to use the following, which has a
default as
>>>>> well:
>>>>> 
>>>>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
>>>>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>>>>> 
>>>>> I'll update the bug report.
>>>>> Having said this... the last issue to tackle is the real
problem with
>>>>> the ganesha.nfsd :-(
>>>> 
>>>> Thanks. Could you try changing log level to NIV_FULL_DEBUG in
'/etc/sysconfig/ganesha' and check if anything gets logged in
'/var/log/ganesha.log' or '/ganesha.log'.
>>>> 
>>>> Thanks,
>>>> Soumya
>>>> 
>>>>> Cheers,
>>>>> 
>>>>> 	Alessandro
>>>>> 
>>>>> 
>>>>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo
wrote:
>>>>>> OK, I can confirm that the ganesha.nsfd process is
actually not
>>>>>> answering to the calls. Here it is what I see:
>>>>>> 
>>>>>> # rpcinfo -p
>>>>>>    program vers proto   port  service
>>>>>>     100000    4   tcp    111  portmapper
>>>>>>     100000    3   tcp    111  portmapper
>>>>>>     100000    2   tcp    111  portmapper
>>>>>>     100000    4   udp    111  portmapper
>>>>>>     100000    3   udp    111  portmapper
>>>>>>     100000    2   udp    111  portmapper
>>>>>>     100024    1   udp  41594  status
>>>>>>     100024    1   tcp  53631  status
>>>>>>     100003    3   udp   2049  nfs
>>>>>>     100003    3   tcp   2049  nfs
>>>>>>     100003    4   udp   2049  nfs
>>>>>>     100003    4   tcp   2049  nfs
>>>>>>     100005    1   udp  58127  mountd
>>>>>>     100005    1   tcp  56301  mountd
>>>>>>     100005    3   udp  58127  mountd
>>>>>>     100005    3   tcp  56301  mountd
>>>>>>     100021    4   udp  46203  nlockmgr
>>>>>>     100021    4   tcp  41798  nlockmgr
>>>>>>     100011    1   udp    875  rquotad
>>>>>>     100011    1   tcp    875  rquotad
>>>>>>     100011    2   udp    875  rquotad
>>>>>>     100011    2   tcp    875  rquotad
>>>>>> 
>>>>>> # netstat -lpn | grep ganesha
>>>>>> tcp6      14      0 :::2049                 :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::41798                :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::875                  :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6      10      0 :::56301                :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::564                  :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> udp6       0      0 :::2049                 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::46203                :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::58127                :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::875                  :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> 
>>>>>> I'm attaching the strace of a showmount from a node
to the other.
>>>>>> This machinery was working with nfs-ganesha 2.1.0, so
it must be
>>>>>> something introduced with 2.2.0.
>>>>>> Cheers,
>>>>>> 
>>>>>> 	Alessandro
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>>>>> 
>>>>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>>>>>> Hi,
>>>>>>>> OK, the problem with the VIPs not starting is
due to the ganesha_mon
>>>>>>>> heartbeat script looking for a pid file called
>>>>>>>> /var/run/ganesha.nfsd.pid, while by default
ganesha.nfsd v.2.2.0 is
>>>>>>>> creating /var/run/ganesha.pid, this needs to be
corrected. The file is
>>>>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my
case.
>>>>>>>> For the moment I have created a symlink in this
way and it works:
>>>>>>>> 
>>>>>>>> ln -s /var/run/ganesha.pid
/var/run/ganesha.nfsd.pid
>>>>>>>> 
>>>>>>> Thanks. Please update this as well in the bug.
>>>>>>> 
>>>>>>>> So far so good, the VIPs are up and pingable,
but still there is the
>>>>>>>> problem of the hanging showmount (i.e. hanging
RPC).
>>>>>>>> Still, I see a lot of errors like this in
/var/log/messages:
>>>>>>>> 
>>>>>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:  
notice: operation_finished:
>>>>>>>> nfs-mon_monitor_10000:29292:stderr [ Error:
Resource does not exist. ]
>>>>>>>> 
>>>>>>>> While ganesha.log shows the server is not in
grace:
>>>>>>>> 
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT
:ganesha.nfsd Starting:
>>>>>>>> Ganesha Version
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>>>>>> May 18 2015 14:17:18 on
buildhw-09.phx2.fedoraproject.org
>>>>>>>>
<http://buildhw-09.phx2.fedoraproject.org>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main]
nfs_set_param_from_conf :NFS STARTUP :EVENT
>>>>>>>> :Configuration file successfully parsed
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT
>>>>>>>> :Initializing ID Mapper.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT :ID Mapper
>>>>>>>> successfully initialized.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP
:WARN :No export entries
>>>>>>>> found in configuration file !!!
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] config_errs_to_log
:CONFIG :WARN :Config File
>>>>>>>> ((null):0): Empty configuration file
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT
>>>>>>>> :CAP_SYS_RESOURCE was successfully removed for
proper quota management
>>>>>>>> in FSAL
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT :currenty set
>>>>>>>> capabilities are:
>>>>>>>>
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP
:CRIT :Cannot acquire
>>>>>>>> credentials for principal nfs
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread
:NFS CB :EVENT :Admin
>>>>>>>> thread initialized
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs4_start_grace
:STATE :EVENT :NFS Server Now
>>>>>>>> IN GRACE, duration 60
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :EVENT
>>>>>>>> :Callback creds directory (/var/run/ganesha)
already exists
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :WARN
>>>>>>>> :gssd_refresh_krb5_machine_credential failed
(2:2)
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :Starting
>>>>>>>> delayed executor.
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :9P/TCP
>>>>>>>> dispatcher thread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[_9p_disp]
_9p_dispatcher_thread :9P DISP :EVENT :9P
>>>>>>>> dispatcher started
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT
>>>>>>>> :gsh_dbusthread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :admin thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :reaper thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now IN
>>>>>>>> GRACE
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :General
>>>>>>>> fridge was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
>>>>>>>>
:-------------------------------------------------
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT :             NFS
>>>>>>>> SERVER INITIALIZED
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
>>>>>>>>
:-------------------------------------------------
>>>>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now
>>>>>>>> NOT IN GRACE
>>>>>>>> 
>>>>>>>> 
>>>>>>> Please check the status of nfs-ganesha
>>>>>>> $service nfs-ganesha status
>>>>>>> 
>>>>>>> Could you try taking a packet trace (during
showmount or mount) and
>>>>>>> check the server responses.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> Alessandro
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Il giorno 09/giu/2015, alle ore 10:36,
Alessandro De Salvo
>>>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>>>> <mailto:alessandro.desalvo at
roma1.infn.it>> ha scritto:
>>>>>>>>> 
>>>>>>>>> Hi Soumya,
>>>>>>>>> 
>>>>>>>>>> Il giorno 09/giu/2015, alle ore 08:06,
Soumya Koduri
>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 06/09/2015 01:31 AM, Alessandro De
Salvo wrote:
>>>>>>>>>>> OK, I found at least one of the
bugs.
>>>>>>>>>>> The /usr/libexec/ganesha/ganesha.sh
has the following lines:
>>>>>>>>>>> 
>>>>>>>>>>>    if [ -e /etc/os-release ]; then
>>>>>>>>>>>       
RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>>    fi
>>>>>>>>>>> 
>>>>>>>>>>> This is OK for RHEL < 7, but
does not work for >= 7. I have changed
>>>>>>>>>>> it to the following, to make it
working:
>>>>>>>>>>> 
>>>>>>>>>>>    if [ -e /etc/os-release ]; then
>>>>>>>>>>>        eval $(grep -F
"REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>>>>>        [
"$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>>    fi
>>>>>>>>>>> 
>>>>>>>>>> Oh..Thanks for the fix. Could you
please file a bug for the same (and
>>>>>>>>>> probably submit your fix as well). We
shall have it corrected.
>>>>>>>>> 
>>>>>>>>> Just did
it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Apart from that, the
VIP_<node> I was using were wrong, and I should
>>>>>>>>>>> have converted all the ?-? to
underscores, maybe this could be
>>>>>>>>>>> mentioned in the documentation when
you will have it ready.
>>>>>>>>>>> Now, the cluster starts, but the
VIPs apparently not:
>>>>>>>>>>> 
>>>>>>>>>> Sure. Thanks again for pointing it out.
We shall make a note of it.
>>>>>>>>>> 
>>>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>>> 
>>>>>>>>>>> Full list of resources:
>>>>>>>>>>> 
>>>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>>>     Started: [ atlas-node1
atlas-node2 ]
>>>>>>>>>>> Clone Set: nfs-grace-clone
[nfs-grace]
>>>>>>>>>>>     Started: [ atlas-node1
atlas-node2 ]
>>>>>>>>>>> atlas-node1-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>>>> atlas-node1-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>>>> atlas-node2-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>> 
>>>>>>>>>>> PCSD Status:
>>>>>>>>>>>  atlas-node1: Online
>>>>>>>>>>>  atlas-node2: Online
>>>>>>>>>>> 
>>>>>>>>>>> Daemon Status:
>>>>>>>>>>>  corosync: active/disabled
>>>>>>>>>>>  pacemaker: active/disabled
>>>>>>>>>>>  pcsd: active/enabled
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> Here corosync and pacemaker shows
'disabled' state. Can you check the
>>>>>>>>>> status of their services. They should
be running prior to cluster
>>>>>>>>>> creation. We need to include that step
in document as well.
>>>>>>>>> 
>>>>>>>>> Ah, OK, you?re right, I have added it to my
puppet modules (we install
>>>>>>>>> and configure ganesha via puppet, I?ll put
the module on puppetforge
>>>>>>>>> soon, in case anyone is interested).
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> But the issue that is puzzling me
more is the following:
>>>>>>>>>>> 
>>>>>>>>>>> # showmount -e localhost
>>>>>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>>>>> 
>>>>>>>>>>> And when I try to enable the
ganesha exports on a volume I get this
>>>>>>>>>>> error:
>>>>>>>>>>> 
>>>>>>>>>>> # gluster volume set atlas-home-01
ganesha.enable on
>>>>>>>>>>> volume set: failed: Failed to
create NFS-Ganesha export config file.
>>>>>>>>>>> 
>>>>>>>>>>> But I see the file created in
/etc/ganesha/exports/*.conf
>>>>>>>>>>> Still, showmount hangs and times
out.
>>>>>>>>>>> Any help?
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>> Hmm that's strange. Sometimes, in
case if there was no proper cleanup
>>>>>>>>>> done while trying to re-create the
cluster, we have seen such issues.
>>>>>>>>>> 
>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>>>>> 
>>>>>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>>>>> 
>>>>>>>>>> Can you please unexport all the
volumes, teardown the cluster using
>>>>>>>>>> 'gluster vol set <volname>
ganesha.enable off?
>>>>>>>>> 
>>>>>>>>> OK:
>>>>>>>>> 
>>>>>>>>> # gluster vol set atlas-home-01
ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is
already 'off'.
>>>>>>>>> 
>>>>>>>>> # gluster vol set atlas-data-01
ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is
already 'off'.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 'gluster ganesha disable'
command.
>>>>>>>>> 
>>>>>>>>> I?m assuming you wanted to write
nfs-ganesha instead?
>>>>>>>>> 
>>>>>>>>> # gluster nfs-ganesha disable
>>>>>>>>> ganesha enable : success
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> A side note (not really important): it?s
strange that when I do a
>>>>>>>>> disable the message is ?ganesha enable? :-)
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Verify if the following files have been
deleted on all the nodes-
>>>>>>>>>> '/etc/cluster/cluster.conf?
>>>>>>>>> 
>>>>>>>>> this file is not present at all, I think
it?s not needed in CentOS 7
>>>>>>>>> 
>>>>>>>>>> '/etc/ganesha/ganesha.conf?,
>>>>>>>>> 
>>>>>>>>> it?s still there, but empty, and I guess it
should be OK, right?
>>>>>>>>> 
>>>>>>>>>> '/etc/ganesha/exports/*?
>>>>>>>>> 
>>>>>>>>> no more files there
>>>>>>>>> 
>>>>>>>>>> '/var/lib/pacemaker/cib?
>>>>>>>>> 
>>>>>>>>> it?s empty
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Verify if the ganesha service is
stopped on all the nodes.
>>>>>>>>> 
>>>>>>>>> nope, it?s still running, I will stop it.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> start/restart the services - corosync,
pcs.
>>>>>>>>> 
>>>>>>>>> In the node where I issued the nfs-ganesha
disable there is no more
>>>>>>>>> any /etc/corosync/corosync.conf so corosync
won?t start. The other
>>>>>>>>> node instead still has the file, it?s
strange.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> And re-try the HA cluster creation
>>>>>>>>>> 'gluster ganesha enable?
>>>>>>>>> 
>>>>>>>>> This time (repeated twice) it did not work
at all:
>>>>>>>>> 
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun  9 10:13:43 2015
>>>>>>>>> Last change: Tue Jun  9 10:13:22 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node1 (1) - partition
with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 6 Resources configured
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>> 
>>>>>>>>> Full list of resources:
>>>>>>>>> 
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> 
>>>>>>>>> PCSD Status:
>>>>>>>>>  atlas-node1: Online
>>>>>>>>>  atlas-node2: Online
>>>>>>>>> 
>>>>>>>>> Daemon Status:
>>>>>>>>>  corosync: active/enabled
>>>>>>>>>  pacemaker: active/enabled
>>>>>>>>>  pcsd: active/enabled
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I tried then "pcs cluster
destroy" on both nodes, and then again
>>>>>>>>> nfs-ganesha enable, but now I?m back to the
old problem:
>>>>>>>>> 
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun  9 10:22:27 2015
>>>>>>>>> Last change: Tue Jun  9 10:17:00 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node2 (2) - partition
with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 10 Resources configured
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>> 
>>>>>>>>> Full list of resources:
>>>>>>>>> 
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>     Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node1-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node1-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node2-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> atlas-node1-dead_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-dead_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> 
>>>>>>>>> PCSD Status:
>>>>>>>>>  atlas-node1: Online
>>>>>>>>>  atlas-node2: Online
>>>>>>>>> 
>>>>>>>>> Daemon Status:
>>>>>>>>>  corosync: active/enabled
>>>>>>>>>  pacemaker: active/enabled
>>>>>>>>>  pcsd: active/enabled
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Alessandro
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>> 
>>>>>>>>>>> Alessandro
>>>>>>>>>>> 
>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore
20:00, Alessandro De Salvo
>>>>>>>>>>>> <Alessandro.DeSalvo at
roma1.infn.it
>>>>>>>>>>>> <mailto:Alessandro.DeSalvo
at roma1.infn.it>> ha scritto:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> indeed, it does not work :-)
>>>>>>>>>>>> OK, this is what I did, with 2
machines, running CentOS 7.1,
>>>>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha
2.2.0:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) ensured that the machines
are able to resolve their IPs (but
>>>>>>>>>>>> this was already true since
they were in the DNS);
>>>>>>>>>>>> 2) disabled NetworkManager and
enabled network on both machines;
>>>>>>>>>>>> 3) created a gluster shared
volume 'gluster_shared_storage' and
>>>>>>>>>>>> mounted it on
'/run/gluster/shared_storage' on all the cluster
>>>>>>>>>>>> nodes using glusterfs native
mount (on CentOS 7.1 there is a link
>>>>>>>>>>>> by default /var/run ->
../run)
>>>>>>>>>>>> 4) created an empty
/etc/ganesha/ganesha.conf;
>>>>>>>>>>>> 5) installed pacemaker pcs
resource-agents corosync on all cluster
>>>>>>>>>>>> machines;
>>>>>>>>>>>> 6) set the ?hacluster? user the
same password on all machines;
>>>>>>>>>>>> 7) pcs cluster auth
<hostname> -u hacluster -p <pass> on all the
>>>>>>>>>>>> nodes (on both nodes I issued
the commands for both nodes)
>>>>>>>>>>>> 8) IPv6 is configured by
default on all nodes, although the
>>>>>>>>>>>> infrastructure is not ready for
IPv6
>>>>>>>>>>>> 9) enabled pcsd and started it
on all nodes
>>>>>>>>>>>> 10) populated
/etc/ganesha/ganesha-ha.conf with the following
>>>>>>>>>>>> contents, one per machine:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> ===> atlas-node1
>>>>>>>>>>>> # Name of the HA cluster
created.
>>>>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you
intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>>
HA_VOL_SERVER=?atlas-node1"
>>>>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the
nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=?x.x.x.2"
>>>>>>>>>>>> 
>>>>>>>>>>>> ===> atlas-node2
>>>>>>>>>>>> # Name of the HA cluster
created.
>>>>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you
intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>>
HA_VOL_SERVER=?atlas-node2"
>>>>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the
nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=?x.x.x.2?
>>>>>>>>>>>> 
>>>>>>>>>>>> 11) issued gluster nfs-ganesha
enable, but it fails with a cryptic
>>>>>>>>>>>> message:
>>>>>>>>>>>> 
>>>>>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>>>>>> Enabling NFS-Ganesha requires
Gluster-NFS to be disabled across the
>>>>>>>>>>>> trusted pool. Do you still want
to continue? (y/n) y
>>>>>>>>>>>> nfs-ganesha: failed: Failed to
set up HA config for NFS-Ganesha.
>>>>>>>>>>>> Please check the log file for
details
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking at the logs I found
nothing really special but this:
>>>>>>>>>>>> 
>>>>>>>>>>>> ==>
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log
<=>>>>>>>>>>>> [2015-06-08
17:57:15.672844] I [MSGID: 106132]
>>>>>>>>>>>>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>>>>>> already stopped
>>>>>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>>>>>>
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>>>>>>
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>>>>>> Initial NFS-Ganesha set up
failed
>>>>>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>>>>>>
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>>>>>>>> operation 'Volume
(null)' failed on localhost : Failed to set up HA
>>>>>>>>>>>> config for NFS-Ganesha. Please
check the log file for details
>>>>>>>>>>>> 
>>>>>>>>>>>> ==>
/var/log/glusterfs/cmd_history.log
<=>>>>>>>>>>>> [2015-06-08
17:57:16.643615]  : nfs-ganesha enable : FAILED :
>>>>>>>>>>>> Failed to set up HA config for
NFS-Ganesha. Please check the log
>>>>>>>>>>>> file for details
>>>>>>>>>>>> 
>>>>>>>>>>>> ==>
/var/log/glusterfs/cli.log <=>>>>>>>>>>>>
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
>>>>>>>>>>>> with: -1
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, pcs seems to be fine for
the auth part, although it obviously
>>>>>>>>>>>> tells me the cluster is not
running.
>>>>>>>>>>>> 
>>>>>>>>>>>> I, [2015-06-08T19:57:16.305323
#7223]  INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/corosync-cmapctl
totem.cluster_name
>>>>>>>>>>>> I, [2015-06-08T19:57:16.345457
#7223]  INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/pcs cluster
token-nodes
>>>>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68 0.1919
>>>>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68 0.1920
>>>>>>>>>>>> atlas-node1.mydomain - -
[08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68
>>>>>>>>>>>> - -> /remote/check_auth
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> What am I doing wrong?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Alessandro
>>>>>>>>>>>> 
>>>>>>>>>>>>> Il giorno 08/giu/2015, alle
ore 19:30, Soumya Koduri
>>>>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 06/08/2015 08:20 PM,
Alessandro De Salvo wrote:
>>>>>>>>>>>>>> Sorry, just another
question:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - in my installation of
gluster 3.7.1 the command gluster
>>>>>>>>>>>>>> features.ganesha enable
does not work:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> # gluster
features.ganesha enable
>>>>>>>>>>>>>> unrecognized word:
features.ganesha (position 0)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Which version has full
support for it?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sorry. This option has
recently been changed. It is now
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ gluster nfs-ganesha
enable
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - in the documentation
the ccs and cman packages are required,
>>>>>>>>>>>>>> but they seems not to
be available anymore on CentOS 7 and
>>>>>>>>>>>>>> similar, I guess they
are not really required anymore, as pcs
>>>>>>>>>>>>>> should do the full job
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Looks like so from
http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>>>>>> Let us know if it
doesn't work.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 15:09, Alessandro De Salvo
>>>>>>>>>>>>>>>
<alessandro.desalvo at roma1.infn.it
>>>>>>>>>>>>>>>
<mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Great, many thanks
Soumya!
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 13:53, Soumya Koduri
>>>>>>>>>>>>>>>> <skoduri at
redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Please find the
slides of the demo video at [1]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We recommend to
have a distributed replica volume as a shared
>>>>>>>>>>>>>>>> volume for
better data-availability.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Size of the
volume depends on the workload you may have. Since
>>>>>>>>>>>>>>>> it is used to
maintain states of NLM/NFSv4 clients, you may
>>>>>>>>>>>>>>>> calculate the
size of the volume to be minimum of aggregate of
>>>>>>>>>>>>>>>>
(typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>>>>>>
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We shall
document about this feature sooner in the gluster docs
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [1] -
http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 06/08/2015
04:34 PM, Alessandro De Salvo wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I have seen
the demo video on ganesha HA,
>>>>>>>>>>>>>>>>>
https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>>>>>> However
there is no advice on the appropriate size of the
>>>>>>>>>>>>>>>>> shared
volume. How is it really used, and what should be a
>>>>>>>>>>>>>>>>> reasonable
size for it?
>>>>>>>>>>>>>>>>> Also, are
the slides from the video available somewhere, as
>>>>>>>>>>>>>>>>> well as a
documentation on all this? I did not manage to find
>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> 
>>>>> 
>>> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1770 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150611/c92cc441/attachment.p7s>

Soumya Koduri

2015-Jun-11 16:16 UTC

head link

[Gluster-users] Questions on ganesha HA and shared storage size

CCin ganesha-devel to get more inputs.

In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.

commit - git show 'd7e8f255' , which got added in v2.2 has more details.

 > # netstat -ltaupn | grep 2049
 > tcp6       4      0 :::2049                 :::*
 > LISTEN      32080/ganesha.nfsd
 > tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
 > -
 > tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
 > CLOSE_WAIT  -
 > udp6       0      0 :::2049                 :::*
 > 32080/ganesha.nfsd
 >

Looks like (even from the logs and the netstat output), there was a 
shutdown request even before the server has come out of grace period.

10/06/2015 01:58:53 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[work-6] nfs_rpc_dequeue_req :DISP :F_DBG :dequeue_req 
try qpair REQ_Q_LOW_LATENCY 0x7fdf8dc67b00:0x7fdf8dc67b68
10/06/2015 01:58:53 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN 
GRACE
......
10/06/2015 01:58:55 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop
10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[main] 
nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
10/06/2015 01:58:55 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[work-12] nfs_rpc_consume_req :DISP :F_DBG :try 
splice, qpair REQ_Q_LOW_LATENCY consumer qsize=0 producer qsize=0
......
10/06/2015 01:59:52 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop
10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[Admin] 
do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
.......
10/06/2015 02:00:00 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now 
NOT IN GRACE
10/06/2015 02:00:00 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop

When you observe the hang, please take 'gstack <ganesha_pid>'
output and
post it in the mail.

Thanks,
Soumya

On 06/11/2015 12:37 AM, Alessandro De Salvo wrote:> Hi,
> by looking at the connections I also see a strange problem:
>
> # netstat -ltaupn | grep 2049
> tcp6       4      0 :::2049                 :::*
> LISTEN      32080/ganesha.nfsd
> tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
> -
> tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
> CLOSE_WAIT  -
> udp6       0      0 :::2049                 :::*
> 32080/ganesha.nfsd
>
>
> Why tcp6 is used with an IPv4 address?
> In another machine where ganesha 2.1.0 is running I see tcp is used, not
> tcp6.
> Could it be that the RPC are always trying to use IPv6? That would be
> wrong.
> Thanks,
>
> 	Alessandro
>
> On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
>>
>> On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
>>> Hi,
>>> I have enabled the full debug already, but I see nothing special.
Before exporting any volume the log shows no error, even when I do a showmount
(the log is attached, ganesha.log.gz). If I do the same after exporting a volume
nfs-ganesha does not even start, complaining for not being able to bind the IPv6
ruota socket, but in fact there is nothing listening on IPv6, so it should not
happen:
>>>
>>> tcp6       0      0 :::111                  :::*                   
LISTEN      7433/rpcbind
>>> tcp6       0      0 :::2224                 :::*                   
LISTEN      9054/ruby
>>> tcp6       0      0 :::22                   :::*                   
LISTEN      1248/sshd
>>> udp6       0      0 :::111                  :::*                   
7433/rpcbind
>>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 ::1:123                 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::5484:7aff:fef:123 :::*                   
31238/ntpd
>>> udp6       0      0 :::123                  :::*                   
31238/ntpd
>>> udp6       0      0 :::824                  :::*                   
7433/rpcbind
>>>
>>> The error, as shown in the attached ganesha-after-export.log.gz
logfile, is the following:
>>>
>>>
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6
socket, error 98 (Address already in use)
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>>>
>>
>> We have seen such issues with RPCBIND few times. NFS-Ganesha setup
first
>> disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes,
>> there could be delay or issue with Gluster-NFS un-registering those
>> services and when NFS-Ganesha tries to register to the same port, it
>> throws this error. Please try registering Rquota to any random port
>> using below config option in "/etc/ganesha/ganesha.conf"
>>
>> NFS_Core_Param {
>>           #Use a non-privileged port for RQuota
>>           Rquota_Port = 4501;
>> }
>>
>> and cleanup '/var/cache/rpcbind/' directory before the setup.
>>
>> Thanks,
>> Soumya
>>
>>>
>>> Thanks,
>>>
>>> 	Alessandro
>>>
>>>
>>>
>>>
>>>> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri
<skoduri at redhat.com> ha scritto:
>>>>
>>>>
>>>>
>>>> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
>>>>> Another update: the fact that I was unable to use vol set
ganesha.enable
>>>>> was due to another bug in the ganesha scripts. In short,
they are all
>>>>> using the following line to get the location of the conf
file:
>>>>>
>>>>> CONF=$(cat /etc/sysconfig/ganesha | grep
"CONFFILE" | cut -f 2 -d "=")
>>>>>
>>>>> First of all by default in /etc/sysconfig/ganesha there is
no line
>>>>> CONFFILE, second there is a bug in that directive, as it
works if I add
>>>>> in /etc/sysconfig/ganesha
>>>>>
>>>>> CONFFILE=/etc/ganesha/ganesha.conf
>>>>>
>>>>> but it fails if the same is quoted
>>>>>
>>>>> CONFFILE="/etc/ganesha/ganesha.conf"
>>>>>
>>>>> It would be much better to use the following, which has a
default as
>>>>> well:
>>>>>
>>>>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
>>>>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>>>>>
>>>>> I'll update the bug report.
>>>>> Having said this... the last issue to tackle is the real
problem with
>>>>> the ganesha.nfsd :-(
>>>>
>>>> Thanks. Could you try changing log level to NIV_FULL_DEBUG in
'/etc/sysconfig/ganesha' and check if anything gets logged in
'/var/log/ganesha.log' or '/ganesha.log'.
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>> Cheers,
>>>>>
>>>>> 	Alessandro
>>>>>
>>>>>
>>>>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo
wrote:
>>>>>> OK, I can confirm that the ganesha.nsfd process is
actually not
>>>>>> answering to the calls. Here it is what I see:
>>>>>>
>>>>>> # rpcinfo -p
>>>>>>      program vers proto   port  service
>>>>>>       100000    4   tcp    111  portmapper
>>>>>>       100000    3   tcp    111  portmapper
>>>>>>       100000    2   tcp    111  portmapper
>>>>>>       100000    4   udp    111  portmapper
>>>>>>       100000    3   udp    111  portmapper
>>>>>>       100000    2   udp    111  portmapper
>>>>>>       100024    1   udp  41594  status
>>>>>>       100024    1   tcp  53631  status
>>>>>>       100003    3   udp   2049  nfs
>>>>>>       100003    3   tcp   2049  nfs
>>>>>>       100003    4   udp   2049  nfs
>>>>>>       100003    4   tcp   2049  nfs
>>>>>>       100005    1   udp  58127  mountd
>>>>>>       100005    1   tcp  56301  mountd
>>>>>>       100005    3   udp  58127  mountd
>>>>>>       100005    3   tcp  56301  mountd
>>>>>>       100021    4   udp  46203  nlockmgr
>>>>>>       100021    4   tcp  41798  nlockmgr
>>>>>>       100011    1   udp    875  rquotad
>>>>>>       100011    1   tcp    875  rquotad
>>>>>>       100011    2   udp    875  rquotad
>>>>>>       100011    2   tcp    875  rquotad
>>>>>>
>>>>>> # netstat -lpn | grep ganesha
>>>>>> tcp6      14      0 :::2049                 :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::41798                :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::875                  :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6      10      0 :::56301                :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::564                  :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> udp6       0      0 :::2049                 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::46203                :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::58127                :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::875                  :::*
>>>>>> 11937/ganesha.nfsd
>>>>>>
>>>>>> I'm attaching the strace of a showmount from a node
to the other.
>>>>>> This machinery was working with nfs-ganesha 2.1.0, so
it must be
>>>>>> something introduced with 2.2.0.
>>>>>> Cheers,
>>>>>>
>>>>>> 	Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>>>>>
>>>>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>>>>>> Hi,
>>>>>>>> OK, the problem with the VIPs not starting is
due to the ganesha_mon
>>>>>>>> heartbeat script looking for a pid file called
>>>>>>>> /var/run/ganesha.nfsd.pid, while by default
ganesha.nfsd v.2.2.0 is
>>>>>>>> creating /var/run/ganesha.pid, this needs to be
corrected. The file is
>>>>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my
case.
>>>>>>>> For the moment I have created a symlink in this
way and it works:
>>>>>>>>
>>>>>>>> ln -s /var/run/ganesha.pid
/var/run/ganesha.nfsd.pid
>>>>>>>>
>>>>>>> Thanks. Please update this as well in the bug.
>>>>>>>
>>>>>>>> So far so good, the VIPs are up and pingable,
but still there is the
>>>>>>>> problem of the hanging showmount (i.e. hanging
RPC).
>>>>>>>> Still, I see a lot of errors like this in
/var/log/messages:
>>>>>>>>
>>>>>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:  
notice: operation_finished:
>>>>>>>> nfs-mon_monitor_10000:29292:stderr [ Error:
Resource does not exist. ]
>>>>>>>>
>>>>>>>> While ganesha.log shows the server is not in
grace:
>>>>>>>>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT
:ganesha.nfsd Starting:
>>>>>>>> Ganesha Version
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>>>>>> May 18 2015 14:17:18 on
buildhw-09.phx2.fedoraproject.org
>>>>>>>>
<http://buildhw-09.phx2.fedoraproject.org>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main]
nfs_set_param_from_conf :NFS STARTUP :EVENT
>>>>>>>> :Configuration file successfully parsed
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT
>>>>>>>> :Initializing ID Mapper.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT :ID Mapper
>>>>>>>> successfully initialized.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP
:WARN :No export entries
>>>>>>>> found in configuration file !!!
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] config_errs_to_log
:CONFIG :WARN :Config File
>>>>>>>> ((null):0): Empty configuration file
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT
>>>>>>>> :CAP_SYS_RESOURCE was successfully removed for
proper quota management
>>>>>>>> in FSAL
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT :currenty set
>>>>>>>> capabilities are:
>>>>>>>>
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP
:CRIT :Cannot acquire
>>>>>>>> credentials for principal nfs
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread
:NFS CB :EVENT :Admin
>>>>>>>> thread initialized
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs4_start_grace
:STATE :EVENT :NFS Server Now
>>>>>>>> IN GRACE, duration 60
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :EVENT
>>>>>>>> :Callback creds directory (/var/run/ganesha)
already exists
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :WARN
>>>>>>>> :gssd_refresh_krb5_machine_credential failed
(2:2)
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :Starting
>>>>>>>> delayed executor.
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :9P/TCP
>>>>>>>> dispatcher thread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[_9p_disp]
_9p_dispatcher_thread :9P DISP :EVENT :9P
>>>>>>>> dispatcher started
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT
>>>>>>>> :gsh_dbusthread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :admin thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :reaper thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now IN
>>>>>>>> GRACE
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :General
>>>>>>>> fridge was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
>>>>>>>>
:-------------------------------------------------
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT :             NFS
>>>>>>>> SERVER INITIALIZED
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
>>>>>>>>
:-------------------------------------------------
>>>>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now
>>>>>>>> NOT IN GRACE
>>>>>>>>
>>>>>>>>
>>>>>>> Please check the status of nfs-ganesha
>>>>>>> $service nfs-ganesha status
>>>>>>>
>>>>>>> Could you try taking a packet trace (during
showmount or mount) and
>>>>>>> check the server responses.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Alessandro
>>>>>>>>
>>>>>>>>
>>>>>>>>> Il giorno 09/giu/2015, alle ore 10:36,
Alessandro De Salvo
>>>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>>>> <mailto:alessandro.desalvo at
roma1.infn.it>> ha scritto:
>>>>>>>>>
>>>>>>>>> Hi Soumya,
>>>>>>>>>
>>>>>>>>>> Il giorno 09/giu/2015, alle ore 08:06,
Soumya Koduri
>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/09/2015 01:31 AM, Alessandro De
Salvo wrote:
>>>>>>>>>>> OK, I found at least one of the
bugs.
>>>>>>>>>>> The /usr/libexec/ganesha/ganesha.sh
has the following lines:
>>>>>>>>>>>
>>>>>>>>>>>      if [ -e /etc/os-release ];
then
>>>>>>>>>>>         
RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>>      fi
>>>>>>>>>>>
>>>>>>>>>>> This is OK for RHEL < 7, but
does not work for >= 7. I have changed
>>>>>>>>>>> it to the following, to make it
working:
>>>>>>>>>>>
>>>>>>>>>>>      if [ -e /etc/os-release ];
then
>>>>>>>>>>>          eval $(grep -F
"REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>>>>>          [
"$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>>      fi
>>>>>>>>>>>
>>>>>>>>>> Oh..Thanks for the fix. Could you
please file a bug for the same (and
>>>>>>>>>> probably submit your fix as well). We
shall have it corrected.
>>>>>>>>>
>>>>>>>>> Just did
it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Apart from that, the
VIP_<node> I was using were wrong, and I should
>>>>>>>>>>> have converted all the ?-? to
underscores, maybe this could be
>>>>>>>>>>> mentioned in the documentation when
you will have it ready.
>>>>>>>>>>> Now, the cluster starts, but the
VIPs apparently not:
>>>>>>>>>>>
>>>>>>>>>> Sure. Thanks again for pointing it out.
We shall make a note of it.
>>>>>>>>>>
>>>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>>>
>>>>>>>>>>> Full list of resources:
>>>>>>>>>>>
>>>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>>>       Started: [ atlas-node1
atlas-node2 ]
>>>>>>>>>>> Clone Set: nfs-grace-clone
[nfs-grace]
>>>>>>>>>>>       Started: [ atlas-node1
atlas-node2 ]
>>>>>>>>>>> atlas-node1-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>>>> atlas-node1-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>>>> atlas-node2-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>>
>>>>>>>>>>> PCSD Status:
>>>>>>>>>>>    atlas-node1: Online
>>>>>>>>>>>    atlas-node2: Online
>>>>>>>>>>>
>>>>>>>>>>> Daemon Status:
>>>>>>>>>>>    corosync: active/disabled
>>>>>>>>>>>    pacemaker: active/disabled
>>>>>>>>>>>    pcsd: active/enabled
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Here corosync and pacemaker shows
'disabled' state. Can you check the
>>>>>>>>>> status of their services. They should
be running prior to cluster
>>>>>>>>>> creation. We need to include that step
in document as well.
>>>>>>>>>
>>>>>>>>> Ah, OK, you?re right, I have added it to my
puppet modules (we install
>>>>>>>>> and configure ganesha via puppet, I?ll put
the module on puppetforge
>>>>>>>>> soon, in case anyone is interested).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> But the issue that is puzzling me
more is the following:
>>>>>>>>>>>
>>>>>>>>>>> # showmount -e localhost
>>>>>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>>>>>
>>>>>>>>>>> And when I try to enable the
ganesha exports on a volume I get this
>>>>>>>>>>> error:
>>>>>>>>>>>
>>>>>>>>>>> # gluster volume set atlas-home-01
ganesha.enable on
>>>>>>>>>>> volume set: failed: Failed to
create NFS-Ganesha export config file.
>>>>>>>>>>>
>>>>>>>>>>> But I see the file created in
/etc/ganesha/exports/*.conf
>>>>>>>>>>> Still, showmount hangs and times
out.
>>>>>>>>>>> Any help?
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>> Hmm that's strange. Sometimes, in
case if there was no proper cleanup
>>>>>>>>>> done while trying to re-create the
cluster, we have seen such issues.
>>>>>>>>>>
>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>>>>>
>>>>>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>>>>>
>>>>>>>>>> Can you please unexport all the
volumes, teardown the cluster using
>>>>>>>>>> 'gluster vol set <volname>
ganesha.enable off?
>>>>>>>>>
>>>>>>>>> OK:
>>>>>>>>>
>>>>>>>>> # gluster vol set atlas-home-01
ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is
already 'off'.
>>>>>>>>>
>>>>>>>>> # gluster vol set atlas-data-01
ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is
already 'off'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 'gluster ganesha disable'
command.
>>>>>>>>>
>>>>>>>>> I?m assuming you wanted to write
nfs-ganesha instead?
>>>>>>>>>
>>>>>>>>> # gluster nfs-ganesha disable
>>>>>>>>> ganesha enable : success
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A side note (not really important): it?s
strange that when I do a
>>>>>>>>> disable the message is ?ganesha enable? :-)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Verify if the following files have been
deleted on all the nodes-
>>>>>>>>>> '/etc/cluster/cluster.conf?
>>>>>>>>>
>>>>>>>>> this file is not present at all, I think
it?s not needed in CentOS 7
>>>>>>>>>
>>>>>>>>>> '/etc/ganesha/ganesha.conf?,
>>>>>>>>>
>>>>>>>>> it?s still there, but empty, and I guess it
should be OK, right?
>>>>>>>>>
>>>>>>>>>> '/etc/ganesha/exports/*?
>>>>>>>>>
>>>>>>>>> no more files there
>>>>>>>>>
>>>>>>>>>> '/var/lib/pacemaker/cib?
>>>>>>>>>
>>>>>>>>> it?s empty
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Verify if the ganesha service is
stopped on all the nodes.
>>>>>>>>>
>>>>>>>>> nope, it?s still running, I will stop it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> start/restart the services - corosync,
pcs.
>>>>>>>>>
>>>>>>>>> In the node where I issued the nfs-ganesha
disable there is no more
>>>>>>>>> any /etc/corosync/corosync.conf so corosync
won?t start. The other
>>>>>>>>> node instead still has the file, it?s
strange.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And re-try the HA cluster creation
>>>>>>>>>> 'gluster ganesha enable?
>>>>>>>>>
>>>>>>>>> This time (repeated twice) it did not work
at all:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun  9 10:13:43 2015
>>>>>>>>> Last change: Tue Jun  9 10:13:22 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node1 (1) - partition
with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 6 Resources configured
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>>    atlas-node1: Online
>>>>>>>>>    atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>>    corosync: active/enabled
>>>>>>>>>    pacemaker: active/enabled
>>>>>>>>>    pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tried then "pcs cluster
destroy" on both nodes, and then again
>>>>>>>>> nfs-ganesha enable, but now I?m back to the
old problem:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun  9 10:22:27 2015
>>>>>>>>> Last change: Tue Jun  9 10:17:00 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node2 (2) - partition
with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 10 Resources configured
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node1-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node1-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node2-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> atlas-node1-dead_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-dead_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>>    atlas-node1: Online
>>>>>>>>>    atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>>    corosync: active/enabled
>>>>>>>>>    pacemaker: active/enabled
>>>>>>>>>    pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Alessandro
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>>
>>>>>>>>>>> Alessandro
>>>>>>>>>>>
>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore
20:00, Alessandro De Salvo
>>>>>>>>>>>> <Alessandro.DeSalvo at
roma1.infn.it
>>>>>>>>>>>> <mailto:Alessandro.DeSalvo
at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> indeed, it does not work :-)
>>>>>>>>>>>> OK, this is what I did, with 2
machines, running CentOS 7.1,
>>>>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha
2.2.0:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) ensured that the machines
are able to resolve their IPs (but
>>>>>>>>>>>> this was already true since
they were in the DNS);
>>>>>>>>>>>> 2) disabled NetworkManager and
enabled network on both machines;
>>>>>>>>>>>> 3) created a gluster shared
volume 'gluster_shared_storage' and
>>>>>>>>>>>> mounted it on
'/run/gluster/shared_storage' on all the cluster
>>>>>>>>>>>> nodes using glusterfs native
mount (on CentOS 7.1 there is a link
>>>>>>>>>>>> by default /var/run ->
../run)
>>>>>>>>>>>> 4) created an empty
/etc/ganesha/ganesha.conf;
>>>>>>>>>>>> 5) installed pacemaker pcs
resource-agents corosync on all cluster
>>>>>>>>>>>> machines;
>>>>>>>>>>>> 6) set the ?hacluster? user the
same password on all machines;
>>>>>>>>>>>> 7) pcs cluster auth
<hostname> -u hacluster -p <pass> on all the
>>>>>>>>>>>> nodes (on both nodes I issued
the commands for both nodes)
>>>>>>>>>>>> 8) IPv6 is configured by
default on all nodes, although the
>>>>>>>>>>>> infrastructure is not ready for
IPv6
>>>>>>>>>>>> 9) enabled pcsd and started it
on all nodes
>>>>>>>>>>>> 10) populated
/etc/ganesha/ganesha-ha.conf with the following
>>>>>>>>>>>> contents, one per machine:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ===> atlas-node1
>>>>>>>>>>>> # Name of the HA cluster
created.
>>>>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you
intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>>
HA_VOL_SERVER=?atlas-node1"
>>>>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the
nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=?x.x.x.2"
>>>>>>>>>>>>
>>>>>>>>>>>> ===> atlas-node2
>>>>>>>>>>>> # Name of the HA cluster
created.
>>>>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you
intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>>
HA_VOL_SERVER=?atlas-node2"
>>>>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the
nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=?x.x.x.2?
>>>>>>>>>>>>
>>>>>>>>>>>> 11) issued gluster nfs-ganesha
enable, but it fails with a cryptic
>>>>>>>>>>>> message:
>>>>>>>>>>>>
>>>>>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>>>>>> Enabling NFS-Ganesha requires
Gluster-NFS to be disabled across the
>>>>>>>>>>>> trusted pool. Do you still want
to continue? (y/n) y
>>>>>>>>>>>> nfs-ganesha: failed: Failed to
set up HA config for NFS-Ganesha.
>>>>>>>>>>>> Please check the log file for
details
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the logs I found
nothing really special but this:
>>>>>>>>>>>>
>>>>>>>>>>>> ==>
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log
<=>>>>>>>>>>>> [2015-06-08
17:57:15.672844] I [MSGID: 106132]
>>>>>>>>>>>>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>>>>>> already stopped
>>>>>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>>>>>>
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>>>>>>
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>>>>>> Initial NFS-Ganesha set up
failed
>>>>>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>>>>>>
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>>>>>>>> operation 'Volume
(null)' failed on localhost : Failed to set up HA
>>>>>>>>>>>> config for NFS-Ganesha. Please
check the log file for details
>>>>>>>>>>>>
>>>>>>>>>>>> ==>
/var/log/glusterfs/cmd_history.log
<=>>>>>>>>>>>> [2015-06-08
17:57:16.643615]  : nfs-ganesha enable : FAILED :
>>>>>>>>>>>> Failed to set up HA config for
NFS-Ganesha. Please check the log
>>>>>>>>>>>> file for details
>>>>>>>>>>>>
>>>>>>>>>>>> ==>
/var/log/glusterfs/cli.log <=>>>>>>>>>>>>
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
>>>>>>>>>>>> with: -1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, pcs seems to be fine for
the auth part, although it obviously
>>>>>>>>>>>> tells me the cluster is not
running.
>>>>>>>>>>>>
>>>>>>>>>>>> I, [2015-06-08T19:57:16.305323
#7223]  INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/corosync-cmapctl
totem.cluster_name
>>>>>>>>>>>> I, [2015-06-08T19:57:16.345457
#7223]  INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/pcs cluster
token-nodes
>>>>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68 0.1919
>>>>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68 0.1920
>>>>>>>>>>>> atlas-node1.mydomain - -
[08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68
>>>>>>>>>>>> - -> /remote/check_auth
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What am I doing wrong?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>
>>>>>>>>>>>>> Il giorno 08/giu/2015, alle
ore 19:30, Soumya Koduri
>>>>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 06/08/2015 08:20 PM,
Alessandro De Salvo wrote:
>>>>>>>>>>>>>> Sorry, just another
question:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - in my installation of
gluster 3.7.1 the command gluster
>>>>>>>>>>>>>> features.ganesha enable
does not work:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # gluster
features.ganesha enable
>>>>>>>>>>>>>> unrecognized word:
features.ganesha (position 0)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which version has full
support for it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry. This option has
recently been changed. It is now
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ gluster nfs-ganesha
enable
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - in the documentation
the ccs and cman packages are required,
>>>>>>>>>>>>>> but they seems not to
be available anymore on CentOS 7 and
>>>>>>>>>>>>>> similar, I guess they
are not really required anymore, as pcs
>>>>>>>>>>>>>> should do the full job
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like so from
http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>>>>>> Let us know if it
doesn't work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 15:09, Alessandro De Salvo
>>>>>>>>>>>>>>>
<alessandro.desalvo at roma1.infn.it
>>>>>>>>>>>>>>>
<mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Great, many thanks
Soumya!
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 13:53, Soumya Koduri
>>>>>>>>>>>>>>>> <skoduri at
redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please find the
slides of the demo video at [1]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We recommend to
have a distributed replica volume as a shared
>>>>>>>>>>>>>>>> volume for
better data-availability.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Size of the
volume depends on the workload you may have. Since
>>>>>>>>>>>>>>>> it is used to
maintain states of NLM/NFSv4 clients, you may
>>>>>>>>>>>>>>>> calculate the
size of the volume to be minimum of aggregate of
>>>>>>>>>>>>>>>>
(typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>>>>>>
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We shall
document about this feature sooner in the gluster docs
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] -
http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 06/08/2015
04:34 PM, Alessandro De Salvo wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I have seen
the demo video on ganesha HA,
>>>>>>>>>>>>>>>>>
https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>>>>>> However
there is no advice on the appropriate size of the
>>>>>>>>>>>>>>>>> shared
volume. How is it really used, and what should be a
>>>>>>>>>>>>>>>>> reasonable
size for it?
>>>>>>>>>>>>>>>>> Also, are
the slides from the video available somewhere, as
>>>>>>>>>>>>>>>>> well as a
documentation on all this? I did not manage to find
>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>
>
>

Gluster users - Jun 2015 - Questions on ganesha HA and shared storage size

[Gluster-users] Questions on ganesha HA and shared storage size

[Gluster-users] Questions on ganesha HA and shared storage size

[Gluster-users] Questions on ganesha HA and shared storage size