thr3ads.net - Gluster users - [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Soumya Koduri

2015-Jun-11 16:16 UTC

[Gluster-users] Questions on ganesha HA and shared storage size

CCin ganesha-devel to get more inputs.

In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.

commit - git show 'd7e8f255' , which got added in v2.2 has more details.

 > # netstat -ltaupn | grep 2049
 > tcp6       4      0 :::2049                 :::*
 > LISTEN      32080/ganesha.nfsd
 > tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
 > -
 > tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
 > CLOSE_WAIT  -
 > udp6       0      0 :::2049                 :::*
 > 32080/ganesha.nfsd
 >

Looks like (even from the logs and the netstat output), there was a 
shutdown request even before the server has come out of grace period.

10/06/2015 01:58:53 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[work-6] nfs_rpc_dequeue_req :DISP :F_DBG :dequeue_req 
try qpair REQ_Q_LOW_LATENCY 0x7fdf8dc67b00:0x7fdf8dc67b68
10/06/2015 01:58:53 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN 
GRACE
......
10/06/2015 01:58:55 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop
10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[main] 
nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
10/06/2015 01:58:55 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[work-12] nfs_rpc_consume_req :DISP :F_DBG :try 
splice, qpair REQ_Q_LOW_LATENCY consumer qsize=0 producer qsize=0
......
10/06/2015 01:59:52 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop
10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[Admin] 
do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
.......
10/06/2015 02:00:00 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now 
NOT IN GRACE
10/06/2015 02:00:00 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop

When you observe the hang, please take 'gstack <ganesha_pid>'
output and
post it in the mail.

Thanks,
Soumya

On 06/11/2015 12:37 AM, Alessandro De Salvo wrote:> Hi,
> by looking at the connections I also see a strange problem:
>
> # netstat -ltaupn | grep 2049
> tcp6       4      0 :::2049                 :::*
> LISTEN      32080/ganesha.nfsd
> tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
> -
> tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
> CLOSE_WAIT  -
> udp6       0      0 :::2049                 :::*
> 32080/ganesha.nfsd
>
>
> Why tcp6 is used with an IPv4 address?
> In another machine where ganesha 2.1.0 is running I see tcp is used, not
> tcp6.
> Could it be that the RPC are always trying to use IPv6? That would be
> wrong.
> Thanks,
>
> 	Alessandro
>
> On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
>>
>> On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
>>> Hi,
>>> I have enabled the full debug already, but I see nothing special.
Before exporting any volume the log shows no error, even when I do a showmount
(the log is attached, ganesha.log.gz). If I do the same after exporting a volume
nfs-ganesha does not even start, complaining for not being able to bind the IPv6
ruota socket, but in fact there is nothing listening on IPv6, so it should not
happen:
>>>
>>> tcp6       0      0 :::111                  :::*                   
LISTEN      7433/rpcbind
>>> tcp6       0      0 :::2224                 :::*                   
LISTEN      9054/ruby
>>> tcp6       0      0 :::22                   :::*                   
LISTEN      1248/sshd
>>> udp6       0      0 :::111                  :::*                   
7433/rpcbind
>>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                   
31238/ntpd
>>> udp6       0      0 ::1:123                 :::*                   
31238/ntpd
>>> udp6       0      0 fe80::5484:7aff:fef:123 :::*                   
31238/ntpd
>>> udp6       0      0 :::123                  :::*                   
31238/ntpd
>>> udp6       0      0 :::824                  :::*                   
7433/rpcbind
>>>
>>> The error, as shown in the attached ganesha-after-export.log.gz
logfile, is the following:
>>>
>>>
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6
socket, error 98 (Address already in use)
>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>>>
>>
>> We have seen such issues with RPCBIND few times. NFS-Ganesha setup
first
>> disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes,
>> there could be delay or issue with Gluster-NFS un-registering those
>> services and when NFS-Ganesha tries to register to the same port, it
>> throws this error. Please try registering Rquota to any random port
>> using below config option in "/etc/ganesha/ganesha.conf"
>>
>> NFS_Core_Param {
>>           #Use a non-privileged port for RQuota
>>           Rquota_Port = 4501;
>> }
>>
>> and cleanup '/var/cache/rpcbind/' directory before the setup.
>>
>> Thanks,
>> Soumya
>>
>>>
>>> Thanks,
>>>
>>> 	Alessandro
>>>
>>>
>>>
>>>
>>>> Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri
<skoduri at redhat.com> ha scritto:
>>>>
>>>>
>>>>
>>>> On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
>>>>> Another update: the fact that I was unable to use vol set
ganesha.enable
>>>>> was due to another bug in the ganesha scripts. In short,
they are all
>>>>> using the following line to get the location of the conf
file:
>>>>>
>>>>> CONF=$(cat /etc/sysconfig/ganesha | grep
"CONFFILE" | cut -f 2 -d "=")
>>>>>
>>>>> First of all by default in /etc/sysconfig/ganesha there is
no line
>>>>> CONFFILE, second there is a bug in that directive, as it
works if I add
>>>>> in /etc/sysconfig/ganesha
>>>>>
>>>>> CONFFILE=/etc/ganesha/ganesha.conf
>>>>>
>>>>> but it fails if the same is quoted
>>>>>
>>>>> CONFFILE="/etc/ganesha/ganesha.conf"
>>>>>
>>>>> It would be much better to use the following, which has a
default as
>>>>> well:
>>>>>
>>>>> eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
>>>>> CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
>>>>>
>>>>> I'll update the bug report.
>>>>> Having said this... the last issue to tackle is the real
problem with
>>>>> the ganesha.nfsd :-(
>>>>
>>>> Thanks. Could you try changing log level to NIV_FULL_DEBUG in
'/etc/sysconfig/ganesha' and check if anything gets logged in
'/var/log/ganesha.log' or '/ganesha.log'.
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>> Cheers,
>>>>>
>>>>> 	Alessandro
>>>>>
>>>>>
>>>>> On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo
wrote:
>>>>>> OK, I can confirm that the ganesha.nsfd process is
actually not
>>>>>> answering to the calls. Here it is what I see:
>>>>>>
>>>>>> # rpcinfo -p
>>>>>>      program vers proto   port  service
>>>>>>       100000    4   tcp    111  portmapper
>>>>>>       100000    3   tcp    111  portmapper
>>>>>>       100000    2   tcp    111  portmapper
>>>>>>       100000    4   udp    111  portmapper
>>>>>>       100000    3   udp    111  portmapper
>>>>>>       100000    2   udp    111  portmapper
>>>>>>       100024    1   udp  41594  status
>>>>>>       100024    1   tcp  53631  status
>>>>>>       100003    3   udp   2049  nfs
>>>>>>       100003    3   tcp   2049  nfs
>>>>>>       100003    4   udp   2049  nfs
>>>>>>       100003    4   tcp   2049  nfs
>>>>>>       100005    1   udp  58127  mountd
>>>>>>       100005    1   tcp  56301  mountd
>>>>>>       100005    3   udp  58127  mountd
>>>>>>       100005    3   tcp  56301  mountd
>>>>>>       100021    4   udp  46203  nlockmgr
>>>>>>       100021    4   tcp  41798  nlockmgr
>>>>>>       100011    1   udp    875  rquotad
>>>>>>       100011    1   tcp    875  rquotad
>>>>>>       100011    2   udp    875  rquotad
>>>>>>       100011    2   tcp    875  rquotad
>>>>>>
>>>>>> # netstat -lpn | grep ganesha
>>>>>> tcp6      14      0 :::2049                 :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::41798                :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::875                  :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6      10      0 :::56301                :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> tcp6       0      0 :::564                  :::*
>>>>>> LISTEN      11937/ganesha.nfsd
>>>>>> udp6       0      0 :::2049                 :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::46203                :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::58127                :::*
>>>>>> 11937/ganesha.nfsd
>>>>>> udp6       0      0 :::875                  :::*
>>>>>> 11937/ganesha.nfsd
>>>>>>
>>>>>> I'm attaching the strace of a showmount from a node
to the other.
>>>>>> This machinery was working with nfs-ganesha 2.1.0, so
it must be
>>>>>> something introduced with 2.2.0.
>>>>>> Cheers,
>>>>>>
>>>>>> 	Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
>>>>>>>
>>>>>>> On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
>>>>>>>> Hi,
>>>>>>>> OK, the problem with the VIPs not starting is
due to the ganesha_mon
>>>>>>>> heartbeat script looking for a pid file called
>>>>>>>> /var/run/ganesha.nfsd.pid, while by default
ganesha.nfsd v.2.2.0 is
>>>>>>>> creating /var/run/ganesha.pid, this needs to be
corrected. The file is
>>>>>>>> in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my
case.
>>>>>>>> For the moment I have created a symlink in this
way and it works:
>>>>>>>>
>>>>>>>> ln -s /var/run/ganesha.pid
/var/run/ganesha.nfsd.pid
>>>>>>>>
>>>>>>> Thanks. Please update this as well in the bug.
>>>>>>>
>>>>>>>> So far so good, the VIPs are up and pingable,
but still there is the
>>>>>>>> problem of the hanging showmount (i.e. hanging
RPC).
>>>>>>>> Still, I see a lot of errors like this in
/var/log/messages:
>>>>>>>>
>>>>>>>> Jun  9 11:15:20 atlas-node1 lrmd[31221]:  
notice: operation_finished:
>>>>>>>> nfs-mon_monitor_10000:29292:stderr [ Error:
Resource does not exist. ]
>>>>>>>>
>>>>>>>> While ganesha.log shows the server is not in
grace:
>>>>>>>>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29964[main] main :MAIN :EVENT
:ganesha.nfsd Starting:
>>>>>>>> Ganesha Version
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
>>>>>>>> May 18 2015 14:17:18 on
buildhw-09.phx2.fedoraproject.org
>>>>>>>>
<http://buildhw-09.phx2.fedoraproject.org>
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main]
nfs_set_param_from_conf :NFS STARTUP :EVENT
>>>>>>>> :Configuration file successfully parsed
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT
>>>>>>>> :Initializing ID Mapper.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] init_server_pkgs :NFS
STARTUP :EVENT :ID Mapper
>>>>>>>> successfully initialized.
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] main :NFS STARTUP
:WARN :No export entries
>>>>>>>> found in configuration file !!!
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] config_errs_to_log
:CONFIG :WARN :Config File
>>>>>>>> ((null):0): Empty configuration file
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT
>>>>>>>> :CAP_SYS_RESOURCE was successfully removed for
proper quota management
>>>>>>>> in FSAL
>>>>>>>> 09/06/2015 11:16:20 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] lower_my_caps :NFS
STARTUP :EVENT :currenty set
>>>>>>>> capabilities are:
>>>>>>>>
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_svc :DISP
:CRIT :Cannot acquire
>>>>>>>> credentials for principal nfs
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Init_admin_thread
:NFS CB :EVENT :Admin
>>>>>>>> thread initialized
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs4_start_grace
:STATE :EVENT :NFS Server Now
>>>>>>>> IN GRACE, duration 60
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :EVENT
>>>>>>>> :Callback creds directory (/var/run/ganesha)
already exists
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache
:NFS STARTUP :WARN
>>>>>>>> :gssd_refresh_krb5_machine_credential failed
(2:2)
>>>>>>>> 09/06/2015 11:16:21 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :Starting
>>>>>>>> delayed executor.
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :9P/TCP
>>>>>>>> dispatcher thread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[_9p_disp]
_9p_dispatcher_thread :9P DISP :EVENT :9P
>>>>>>>> dispatcher started
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT
>>>>>>>> :gsh_dbusthread was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :admin thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :reaper thread
>>>>>>>> was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now IN
>>>>>>>> GRACE
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_Start_threads
:THREAD :EVENT :General
>>>>>>>> fridge was started successfully
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
>>>>>>>>
:-------------------------------------------------
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT :             NFS
>>>>>>>> SERVER INITIALIZED
>>>>>>>> 09/06/2015 11:16:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[main] nfs_start :NFS STARTUP
:EVENT
>>>>>>>>
:-------------------------------------------------
>>>>>>>> 09/06/2015 11:17:22 : epoch 5576aee4 :
atlas-node1 :
>>>>>>>> ganesha.nfsd-29965[reaper] nfs_in_grace :STATE
:EVENT :NFS Server Now
>>>>>>>> NOT IN GRACE
>>>>>>>>
>>>>>>>>
>>>>>>> Please check the status of nfs-ganesha
>>>>>>> $service nfs-ganesha status
>>>>>>>
>>>>>>> Could you try taking a packet trace (during
showmount or mount) and
>>>>>>> check the server responses.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Soumya
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Alessandro
>>>>>>>>
>>>>>>>>
>>>>>>>>> Il giorno 09/giu/2015, alle ore 10:36,
Alessandro De Salvo
>>>>>>>>> <alessandro.desalvo at roma1.infn.it
>>>>>>>>> <mailto:alessandro.desalvo at
roma1.infn.it>> ha scritto:
>>>>>>>>>
>>>>>>>>> Hi Soumya,
>>>>>>>>>
>>>>>>>>>> Il giorno 09/giu/2015, alle ore 08:06,
Soumya Koduri
>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/09/2015 01:31 AM, Alessandro De
Salvo wrote:
>>>>>>>>>>> OK, I found at least one of the
bugs.
>>>>>>>>>>> The /usr/libexec/ganesha/ganesha.sh
has the following lines:
>>>>>>>>>>>
>>>>>>>>>>>      if [ -e /etc/os-release ];
then
>>>>>>>>>>>         
RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>>      fi
>>>>>>>>>>>
>>>>>>>>>>> This is OK for RHEL < 7, but
does not work for >= 7. I have changed
>>>>>>>>>>> it to the following, to make it
working:
>>>>>>>>>>>
>>>>>>>>>>>      if [ -e /etc/os-release ];
then
>>>>>>>>>>>          eval $(grep -F
"REDHAT_SUPPORT_PRODUCT=" /etc/os-release)
>>>>>>>>>>>          [
"$REDHAT_SUPPORT_PRODUCT" == "Fedora" ] &&
>>>>>>>>>>> RHEL6_PCS_CNAME_OPTION=""
>>>>>>>>>>>      fi
>>>>>>>>>>>
>>>>>>>>>> Oh..Thanks for the fix. Could you
please file a bug for the same (and
>>>>>>>>>> probably submit your fix as well). We
shall have it corrected.
>>>>>>>>>
>>>>>>>>> Just did
it,https://bugzilla.redhat.com/show_bug.cgi?id=1229601
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Apart from that, the
VIP_<node> I was using were wrong, and I should
>>>>>>>>>>> have converted all the ?-? to
underscores, maybe this could be
>>>>>>>>>>> mentioned in the documentation when
you will have it ready.
>>>>>>>>>>> Now, the cluster starts, but the
VIPs apparently not:
>>>>>>>>>>>
>>>>>>>>>> Sure. Thanks again for pointing it out.
We shall make a note of it.
>>>>>>>>>>
>>>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>>>
>>>>>>>>>>> Full list of resources:
>>>>>>>>>>>
>>>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>>>       Started: [ atlas-node1
atlas-node2 ]
>>>>>>>>>>> Clone Set: nfs-grace-clone
[nfs-grace]
>>>>>>>>>>>       Started: [ atlas-node1
atlas-node2 ]
>>>>>>>>>>> atlas-node1-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>>>> atlas-node1-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-cluster_ip-1 
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>>>> atlas-node2-trigger_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>>>
>>>>>>>>>>> PCSD Status:
>>>>>>>>>>>    atlas-node1: Online
>>>>>>>>>>>    atlas-node2: Online
>>>>>>>>>>>
>>>>>>>>>>> Daemon Status:
>>>>>>>>>>>    corosync: active/disabled
>>>>>>>>>>>    pacemaker: active/disabled
>>>>>>>>>>>    pcsd: active/enabled
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Here corosync and pacemaker shows
'disabled' state. Can you check the
>>>>>>>>>> status of their services. They should
be running prior to cluster
>>>>>>>>>> creation. We need to include that step
in document as well.
>>>>>>>>>
>>>>>>>>> Ah, OK, you?re right, I have added it to my
puppet modules (we install
>>>>>>>>> and configure ganesha via puppet, I?ll put
the module on puppetforge
>>>>>>>>> soon, in case anyone is interested).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> But the issue that is puzzling me
more is the following:
>>>>>>>>>>>
>>>>>>>>>>> # showmount -e localhost
>>>>>>>>>>> rpc mount export: RPC: Timed out
>>>>>>>>>>>
>>>>>>>>>>> And when I try to enable the
ganesha exports on a volume I get this
>>>>>>>>>>> error:
>>>>>>>>>>>
>>>>>>>>>>> # gluster volume set atlas-home-01
ganesha.enable on
>>>>>>>>>>> volume set: failed: Failed to
create NFS-Ganesha export config file.
>>>>>>>>>>>
>>>>>>>>>>> But I see the file created in
/etc/ganesha/exports/*.conf
>>>>>>>>>>> Still, showmount hangs and times
out.
>>>>>>>>>>> Any help?
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>> Hmm that's strange. Sometimes, in
case if there was no proper cleanup
>>>>>>>>>> done while trying to re-create the
cluster, we have seen such issues.
>>>>>>>>>>
>>>>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1227709
>>>>>>>>>>
>>>>>>>>>> http://review.gluster.org/#/c/11093/
>>>>>>>>>>
>>>>>>>>>> Can you please unexport all the
volumes, teardown the cluster using
>>>>>>>>>> 'gluster vol set <volname>
ganesha.enable off?
>>>>>>>>>
>>>>>>>>> OK:
>>>>>>>>>
>>>>>>>>> # gluster vol set atlas-home-01
ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is
already 'off'.
>>>>>>>>>
>>>>>>>>> # gluster vol set atlas-data-01
ganesha.enable off
>>>>>>>>> volume set: failed: ganesha.enable is
already 'off'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 'gluster ganesha disable'
command.
>>>>>>>>>
>>>>>>>>> I?m assuming you wanted to write
nfs-ganesha instead?
>>>>>>>>>
>>>>>>>>> # gluster nfs-ganesha disable
>>>>>>>>> ganesha enable : success
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A side note (not really important): it?s
strange that when I do a
>>>>>>>>> disable the message is ?ganesha enable? :-)
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Verify if the following files have been
deleted on all the nodes-
>>>>>>>>>> '/etc/cluster/cluster.conf?
>>>>>>>>>
>>>>>>>>> this file is not present at all, I think
it?s not needed in CentOS 7
>>>>>>>>>
>>>>>>>>>> '/etc/ganesha/ganesha.conf?,
>>>>>>>>>
>>>>>>>>> it?s still there, but empty, and I guess it
should be OK, right?
>>>>>>>>>
>>>>>>>>>> '/etc/ganesha/exports/*?
>>>>>>>>>
>>>>>>>>> no more files there
>>>>>>>>>
>>>>>>>>>> '/var/lib/pacemaker/cib?
>>>>>>>>>
>>>>>>>>> it?s empty
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Verify if the ganesha service is
stopped on all the nodes.
>>>>>>>>>
>>>>>>>>> nope, it?s still running, I will stop it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> start/restart the services - corosync,
pcs.
>>>>>>>>>
>>>>>>>>> In the node where I issued the nfs-ganesha
disable there is no more
>>>>>>>>> any /etc/corosync/corosync.conf so corosync
won?t start. The other
>>>>>>>>> node instead still has the file, it?s
strange.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And re-try the HA cluster creation
>>>>>>>>>> 'gluster ganesha enable?
>>>>>>>>>
>>>>>>>>> This time (repeated twice) it did not work
at all:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun  9 10:13:43 2015
>>>>>>>>> Last change: Tue Jun  9 10:13:22 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node1 (1) - partition
with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 6 Resources configured
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node2-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node1-dead_ip-1    
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>>    atlas-node1: Online
>>>>>>>>>    atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>>    corosync: active/enabled
>>>>>>>>>    pacemaker: active/enabled
>>>>>>>>>    pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I tried then "pcs cluster
destroy" on both nodes, and then again
>>>>>>>>> nfs-ganesha enable, but now I?m back to the
old problem:
>>>>>>>>>
>>>>>>>>> # pcs status
>>>>>>>>> Cluster name: ATLAS_GANESHA_01
>>>>>>>>> Last updated: Tue Jun  9 10:22:27 2015
>>>>>>>>> Last change: Tue Jun  9 10:17:00 2015
>>>>>>>>> Stack: corosync
>>>>>>>>> Current DC: atlas-node2 (2) - partition
with quorum
>>>>>>>>> Version: 1.1.12-a14efad
>>>>>>>>> 2 Nodes configured
>>>>>>>>> 10 Resources configured
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Online: [ atlas-node1 atlas-node2 ]
>>>>>>>>>
>>>>>>>>> Full list of resources:
>>>>>>>>>
>>>>>>>>> Clone Set: nfs-mon-clone [nfs-mon]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> Clone Set: nfs-grace-clone [nfs-grace]
>>>>>>>>>       Started: [ atlas-node1 atlas-node2 ]
>>>>>>>>> atlas-node1-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node1-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-cluster_ip-1      
(ocf::heartbeat:IPaddr):        Stopped
>>>>>>>>> atlas-node2-trigger_ip-1      
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>> atlas-node1-dead_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node1
>>>>>>>>> atlas-node2-dead_ip-1 
(ocf::heartbeat:Dummy): Started atlas-node2
>>>>>>>>>
>>>>>>>>> PCSD Status:
>>>>>>>>>    atlas-node1: Online
>>>>>>>>>    atlas-node2: Online
>>>>>>>>>
>>>>>>>>> Daemon Status:
>>>>>>>>>    corosync: active/enabled
>>>>>>>>>    pacemaker: active/enabled
>>>>>>>>>    pcsd: active/enabled
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Alessandro
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Soumya
>>>>>>>>>>
>>>>>>>>>>> Alessandro
>>>>>>>>>>>
>>>>>>>>>>>> Il giorno 08/giu/2015, alle ore
20:00, Alessandro De Salvo
>>>>>>>>>>>> <Alessandro.DeSalvo at
roma1.infn.it
>>>>>>>>>>>> <mailto:Alessandro.DeSalvo
at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> indeed, it does not work :-)
>>>>>>>>>>>> OK, this is what I did, with 2
machines, running CentOS 7.1,
>>>>>>>>>>>> Glusterfs 3.7.1 and nfs-ganesha
2.2.0:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) ensured that the machines
are able to resolve their IPs (but
>>>>>>>>>>>> this was already true since
they were in the DNS);
>>>>>>>>>>>> 2) disabled NetworkManager and
enabled network on both machines;
>>>>>>>>>>>> 3) created a gluster shared
volume 'gluster_shared_storage' and
>>>>>>>>>>>> mounted it on
'/run/gluster/shared_storage' on all the cluster
>>>>>>>>>>>> nodes using glusterfs native
mount (on CentOS 7.1 there is a link
>>>>>>>>>>>> by default /var/run ->
../run)
>>>>>>>>>>>> 4) created an empty
/etc/ganesha/ganesha.conf;
>>>>>>>>>>>> 5) installed pacemaker pcs
resource-agents corosync on all cluster
>>>>>>>>>>>> machines;
>>>>>>>>>>>> 6) set the ?hacluster? user the
same password on all machines;
>>>>>>>>>>>> 7) pcs cluster auth
<hostname> -u hacluster -p <pass> on all the
>>>>>>>>>>>> nodes (on both nodes I issued
the commands for both nodes)
>>>>>>>>>>>> 8) IPv6 is configured by
default on all nodes, although the
>>>>>>>>>>>> infrastructure is not ready for
IPv6
>>>>>>>>>>>> 9) enabled pcsd and started it
on all nodes
>>>>>>>>>>>> 10) populated
/etc/ganesha/ganesha-ha.conf with the following
>>>>>>>>>>>> contents, one per machine:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ===> atlas-node1
>>>>>>>>>>>> # Name of the HA cluster
created.
>>>>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you
intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>>
HA_VOL_SERVER=?atlas-node1"
>>>>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the
nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=?x.x.x.2"
>>>>>>>>>>>>
>>>>>>>>>>>> ===> atlas-node2
>>>>>>>>>>>> # Name of the HA cluster
created.
>>>>>>>>>>>>
HA_NAME="ATLAS_GANESHA_01"
>>>>>>>>>>>> # The server from which you
intend to mount
>>>>>>>>>>>> # the shared volume.
>>>>>>>>>>>>
HA_VOL_SERVER=?atlas-node2"
>>>>>>>>>>>> # The subset of nodes of the
Gluster Trusted Pool
>>>>>>>>>>>> # that forms the ganesha HA
cluster. IP/Hostname
>>>>>>>>>>>> # is specified.
>>>>>>>>>>>>
HA_CLUSTER_NODES=?atlas-node1,atlas-node2"
>>>>>>>>>>>> # Virtual IPs of each of the
nodes specified above.
>>>>>>>>>>>> VIP_atlas-node1=?x.x.x.1"
>>>>>>>>>>>> VIP_atlas-node2=?x.x.x.2?
>>>>>>>>>>>>
>>>>>>>>>>>> 11) issued gluster nfs-ganesha
enable, but it fails with a cryptic
>>>>>>>>>>>> message:
>>>>>>>>>>>>
>>>>>>>>>>>> # gluster nfs-ganesha enable
>>>>>>>>>>>> Enabling NFS-Ganesha requires
Gluster-NFS to be disabled across the
>>>>>>>>>>>> trusted pool. Do you still want
to continue? (y/n) y
>>>>>>>>>>>> nfs-ganesha: failed: Failed to
set up HA config for NFS-Ganesha.
>>>>>>>>>>>> Please check the log file for
details
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the logs I found
nothing really special but this:
>>>>>>>>>>>>
>>>>>>>>>>>> ==>
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log
<=>>>>>>>>>>>> [2015-06-08
17:57:15.672844] I [MSGID: 106132]
>>>>>>>>>>>>
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs
>>>>>>>>>>>> already stopped
>>>>>>>>>>>> [2015-06-08 17:57:15.675395] I
>>>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.720692] I
>>>>>>>>>>>>
[glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:15.721161] I
>>>>>>>>>>>>
[glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host
>>>>>>>>>>>> found Hostname is atlas-node2
>>>>>>>>>>>> [2015-06-08 17:57:16.633048] E
>>>>>>>>>>>>
[glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management:
>>>>>>>>>>>> Initial NFS-Ganesha set up
failed
>>>>>>>>>>>> [2015-06-08 17:57:16.641563] E
>>>>>>>>>>>>
[glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of
>>>>>>>>>>>> operation 'Volume
(null)' failed on localhost : Failed to set up HA
>>>>>>>>>>>> config for NFS-Ganesha. Please
check the log file for details
>>>>>>>>>>>>
>>>>>>>>>>>> ==>
/var/log/glusterfs/cmd_history.log
<=>>>>>>>>>>>> [2015-06-08
17:57:16.643615]  : nfs-ganesha enable : FAILED :
>>>>>>>>>>>> Failed to set up HA config for
NFS-Ganesha. Please check the log
>>>>>>>>>>>> file for details
>>>>>>>>>>>>
>>>>>>>>>>>> ==>
/var/log/glusterfs/cli.log <=>>>>>>>>>>>>
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting
>>>>>>>>>>>> with: -1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Also, pcs seems to be fine for
the auth part, although it obviously
>>>>>>>>>>>> tells me the cluster is not
running.
>>>>>>>>>>>>
>>>>>>>>>>>> I, [2015-06-08T19:57:16.305323
#7223]  INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/corosync-cmapctl
totem.cluster_name
>>>>>>>>>>>> I, [2015-06-08T19:57:16.345457
#7223]  INFO -- : Running:
>>>>>>>>>>>> /usr/sbin/pcs cluster
token-nodes
>>>>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68 0.1919
>>>>>>>>>>>> ::ffff:141.108.38.46 - -
[08/Jun/2015 19:57:16] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68 0.1920
>>>>>>>>>>>> atlas-node1.mydomain - -
[08/Jun/2015:19:57:16 CEST] "GET
>>>>>>>>>>>> /remote/check_auth
HTTP/1.1" 200 68
>>>>>>>>>>>> - -> /remote/check_auth
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What am I doing wrong?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>
>>>>>>>>>>>>> Il giorno 08/giu/2015, alle
ore 19:30, Soumya Koduri
>>>>>>>>>>>>> <skoduri at redhat.com
<mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 06/08/2015 08:20 PM,
Alessandro De Salvo wrote:
>>>>>>>>>>>>>> Sorry, just another
question:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - in my installation of
gluster 3.7.1 the command gluster
>>>>>>>>>>>>>> features.ganesha enable
does not work:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # gluster
features.ganesha enable
>>>>>>>>>>>>>> unrecognized word:
features.ganesha (position 0)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which version has full
support for it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry. This option has
recently been changed. It is now
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ gluster nfs-ganesha
enable
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - in the documentation
the ccs and cman packages are required,
>>>>>>>>>>>>>> but they seems not to
be available anymore on CentOS 7 and
>>>>>>>>>>>>>> similar, I guess they
are not really required anymore, as pcs
>>>>>>>>>>>>>> should do the full job
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like so from
http://clusterlabs.org/quickstart-redhat.html.
>>>>>>>>>>>>> Let us know if it
doesn't work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 15:09, Alessandro De Salvo
>>>>>>>>>>>>>>>
<alessandro.desalvo at roma1.infn.it
>>>>>>>>>>>>>>>
<mailto:alessandro.desalvo at roma1.infn.it>> ha scritto:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Great, many thanks
Soumya!
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Il giorno
08/giu/2015, alle ore 13:53, Soumya Koduri
>>>>>>>>>>>>>>>> <skoduri at
redhat.com <mailto:skoduri at redhat.com>> ha scritto:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please find the
slides of the demo video at [1]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We recommend to
have a distributed replica volume as a shared
>>>>>>>>>>>>>>>> volume for
better data-availability.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Size of the
volume depends on the workload you may have. Since
>>>>>>>>>>>>>>>> it is used to
maintain states of NLM/NFSv4 clients, you may
>>>>>>>>>>>>>>>> calculate the
size of the volume to be minimum of aggregate of
>>>>>>>>>>>>>>>>
(typical_size_of'/var/lib/nfs'_directory +
>>>>>>>>>>>>>>>>
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We shall
document about this feature sooner in the gluster docs
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Soumya
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] -
http://www.slideshare.net/SoumyaKoduri/high-49117846
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 06/08/2015
04:34 PM, Alessandro De Salvo wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I have seen
the demo video on ganesha HA,
>>>>>>>>>>>>>>>>>
https://www.youtube.com/watch?v=Z4mvTQC-efM
>>>>>>>>>>>>>>>>> However
there is no advice on the appropriate size of the
>>>>>>>>>>>>>>>>> shared
volume. How is it really used, and what should be a
>>>>>>>>>>>>>>>>> reasonable
size for it?
>>>>>>>>>>>>>>>>> Also, are
the slides from the video available somewhere, as
>>>>>>>>>>>>>>>>> well as a
documentation on all this? I did not manage to find
>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Alessandro
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>>>>>>>>
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>
>
>

Malahal Naineni

2015-Jun-11 16:37 UTC

head link

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

Soumya Koduri [skoduri at redhat.com] wrote:> CCin ganesha-devel to get more inputs.
> 
> In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.
I am not a network expert but I have seen IPv4 traffic over IPv6
interface while fixing few things before. This may be normal.
> commit - git show 'd7e8f255' , which got added in v2.2 has more
details.
> 
>  > # netstat -ltaupn | grep 2049
>  > tcp6       4      0 :::2049                 :::*
>  > LISTEN      32080/ganesha.nfsd
>  > tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
>  > -
>  > tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
>  > CLOSE_WAIT  -
>  > udp6       0      0 :::2049                 :::*
>  > 32080/ganesha.nfsd
>  >
> 
> >>> I have enabled the full debug already, but I see nothing
special. Before exporting any volume the log shows no error, even when I do a
showmount (the log is attached, ganesha.log.gz). If I do the same after
exporting a volume nfs-ganesha does not even start, complaining for not being
able to bind the IPv6 ruota socket, but in fact there is nothing listening on
IPv6, so it should not happen:
> >>>
> >>> tcp6       0      0 :::111                  :::*              
LISTEN      7433/rpcbind
> >>> tcp6       0      0 :::2224                 :::*              
LISTEN      9054/ruby
> >>> tcp6       0      0 :::22                   :::*              
LISTEN      1248/sshd
> >>> udp6       0      0 :::111                  :::*              
7433/rpcbind
> >>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*              
31238/ntpd
> >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*              
31238/ntpd
> >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*              
31238/ntpd
> >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*              
31238/ntpd
> >>> udp6       0      0 ::1:123                 :::*              
31238/ntpd
> >>> udp6       0      0 fe80::5484:7aff:fef:123 :::*              
31238/ntpd
> >>> udp6       0      0 :::123                  :::*              
31238/ntpd
> >>> udp6       0      0 :::824                  :::*              
7433/rpcbind
> >>>
> >>> The error, as shown in the attached
ganesha-after-export.log.gz logfile, is the following:
> >>>
> >>>
> >>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6
socket, error 98 (Address already in use)
> >>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
> >>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
> >>>
The above messages indicate that someone tried to restart ganesha. But
ganesha failed to come up because RQUOTA port (default is 875) is
already in use by an old ganesha instance or some other program holding
it. The new instance of ganesha will die, but if you are using systemd,
it will try to restart automatically. We have disabled systemd auto
restart in our environment as it was causing issues for debugging.

What version of ganesha is this?

Regards, Malahal.

Frank Filz

2015-Jun-11 16:44 UTC

head link

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

> Soumya Koduri [skoduri at redhat.com] wrote:
> > CCin ganesha-devel to get more inputs.
> >
> > In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.
> 
> I am not a network expert but I have seen IPv4 traffic over IPv6 interface
> while fixing few things before. This may be normal.
IPv6 can encapsulate IPv4 traffic. In my testing I use IPv4 addresses, but
they are encapsulated in IPv6 (and thus forced me to get Ganesha's support
for that to actually work...).
> > commit - git show 'd7e8f255' , which got added in v2.2 has
more details.
> >
> >  > # netstat -ltaupn | grep 2049
> >  > tcp6       4      0 :::2049                 :::*
> >  > LISTEN      32080/ganesha.nfsd
> >  > tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285    
CLOSE_WAIT
> >  > -
> >  > tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
> >  > CLOSE_WAIT  -
> >  > udp6       0      0 :::2049                 :::*
> >  > 32080/ganesha.nfsd
> >  >
> >
> > >>> I have enabled the full debug already, but I see nothing
special.
Before> exporting any volume the log shows no error, even when I do a showmount
> (the log is attached, ganesha.log.gz). If I do the same after exporting a
> volume nfs-ganesha does not even start, complaining for not being able to
> bind the IPv6 ruota socket, but in fact there is nothing listening on
IPv6, so it> should not happen:
> > >>>
> > >>> tcp6       0      0 :::111                  :::*
LISTEN      7433/rpcbind> > >>> tcp6       0      0 :::2224                 :::*
LISTEN      9054/ruby> > >>> tcp6       0      0 :::22                   :::*
LISTEN      1248/sshd> > >>> udp6       0      0 :::111                  :::*
7433/rpcbind> > >>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*
31238/ntpd> > >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*
31238/ntpd> > >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*
31238/ntpd> > >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*
31238/ntpd> > >>> udp6       0      0 ::1:123                 :::*
31238/ntpd> > >>> udp6       0      0 fe80::5484:7aff:fef:123 :::*
31238/ntpd> > >>> udp6       0      0 :::123                  :::*
31238/ntpd> > >>> udp6       0      0 :::824                  :::*
7433/rpcbind> > >>>
> > >>> The error, as shown in the attached
ganesha-after-export.log.gz
> logfile, is the following:
> > >>>
> > >>>
> > >>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
> > >>> ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN
:Cannot
> bind
> > >>> RQUOTA tcp6 socket, error 98 (Address already in use)
> > >>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-
> 26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface.
Cannot> continue.
> > >>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
> > >>> ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG
:FSAL
> > >>> Gluster unloaded
> > >>>
> 
> The above messages indicate that someone tried to restart ganesha. But
> ganesha failed to come up because RQUOTA port (default is 875) is already
in> use by an old ganesha instance or some other program holding it. The new
> instance of ganesha will die, but if you are using systemd, it will try to
restart> automatically. We have disabled systemd auto restart in our environment as
> it was causing issues for debugging.
> 
> What version of ganesha is this?
> 
> Regards, Malahal.
> 
> 
>----------------------------------------------------------------------------
--> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Alessandro De Salvo

2015-Jun-11 18:08 UTC

head link

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

Hi,
this was an extract from the old logs, before Soumya's suggestion of
changing the rquota port in the conf file. The new logs are attached
(ganesha-20150611.log.gz) as well as the gstack of the ganesha process
while I was executing the hanging showmount
(ganesha-20150611.gstack.gz).
Thanks,

	Alessandro



On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni
wrote:> Soumya Koduri [skoduri at redhat.com] wrote:
> > CCin ganesha-devel to get more inputs.
> > 
> > In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.
> 
> I am not a network expert but I have seen IPv4 traffic over IPv6
> interface while fixing few things before. This may be normal.
> 
> > commit - git show 'd7e8f255' , which got added in v2.2 has
more details.
> > 
> >  > # netstat -ltaupn | grep 2049
> >  > tcp6       4      0 :::2049                 :::*
> >  > LISTEN      32080/ganesha.nfsd
> >  > tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285    
CLOSE_WAIT
> >  > -
> >  > tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
> >  > CLOSE_WAIT  -
> >  > udp6       0      0 :::2049                 :::*
> >  > 32080/ganesha.nfsd
> >  >
> > 
> > >>> I have enabled the full debug already, but I see nothing
special. Before exporting any volume the log shows no error, even when I do a
showmount (the log is attached, ganesha.log.gz). If I do the same after
exporting a volume nfs-ganesha does not even start, complaining for not being
able to bind the IPv6 ruota socket, but in fact there is nothing listening on
IPv6, so it should not happen:
> > >>>
> > >>> tcp6       0      0 :::111                  :::*         
LISTEN      7433/rpcbind
> > >>> tcp6       0      0 :::2224                 :::*         
LISTEN      9054/ruby
> > >>> tcp6       0      0 :::22                   :::*         
LISTEN      1248/sshd
> > >>> udp6       0      0 :::111                  :::*         
7433/rpcbind
> > >>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*         
31238/ntpd
> > >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*         
31238/ntpd
> > >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*         
31238/ntpd
> > >>> udp6       0      0 fe80::230:48ff:fed2:123 :::*         
31238/ntpd
> > >>> udp6       0      0 ::1:123                 :::*         
31238/ntpd
> > >>> udp6       0      0 fe80::5484:7aff:fef:123 :::*         
31238/ntpd
> > >>> udp6       0      0 :::123                  :::*         
31238/ntpd
> > >>> udp6       0      0 :::824                  :::*         
7433/rpcbind
> > >>>
> > >>> The error, as shown in the attached
ganesha-after-export.log.gz logfile, is the following:
> > >>>
> > >>>
> > >>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6
socket, error 98 (Address already in use)
> > >>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
> > >>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 :
ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
> > >>>
> 
> The above messages indicate that someone tried to restart ganesha. But
> ganesha failed to come up because RQUOTA port (default is 875) is
> already in use by an old ganesha instance or some other program holding
> it. The new instance of ganesha will die, but if you are using systemd,
> it will try to restart automatically. We have disabled systemd auto
> restart in our environment as it was causing issues for debugging.
> 
> What version of ganesha is this?
> 
> Regards, Malahal.
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ganesha-20150611.gstack.gz
Type: application/gzip
Size: 808 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150611/1ec06578/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ganesha-20150611.log.gz
Type: application/gzip
Size: 2981 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150611/1ec06578/attachment-0001.bin>

Gluster users - Jun 2015 - [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

[Gluster-users] Questions on ganesha HA and shared storage size

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size