thr3ads.net - Gluster users - [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Alessandro De Salvo

2015-Jun-15 12:21 UTC

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

Hi,
any news on this? Did you have the chance to look into that?
I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1
and if it was really working, as I also tried on a standalone, clean
machine, and I see the very same behavior, even without gluster.
Thanks,

	Alessandro

On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo
wrote:> Hi,
> looking at the code and having recompiled adding some more debug, I
> might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c,
> fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe->flags
&
> Wqe_LFlag_SyncDone)) and never exit from there.
> I do not know if it's normal or not as I should read better the code.
> Cheers,
> 
> 	Alessandro
> 
> On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote:
> > Hi Malahal,
> > 
> > 
> > > Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni
<malahal at us.ibm.com> ha scritto:
> > > 
> > > The logs indicate that ganesha was started successfully without
any
> > > exports.  gstack output seemed normal as well -- threads were
waiting to
> > > serve requests.
> > 
> > Yes, no exports as it was the default config before enabling Ganesha
on any gluster volume.
> > 
> > > 
> > > Assuming that you are running "showmount -e" on the
same system, there
> > > shouldn't be any firewall coming into the picture.
> > 
> > Yes it was the case in my last attempt, from the same machine. I also
tried from another machine, but the result was the same. The firewall
(firewalld, as it's a CentOS 7.1) is disabled anyways.
> > 
> > > If you are running
> > > "showmount" from some other system, make sure there is
no firewall
> > > dropping the packets.
> > > 
> > > I think you need tcpdump trace to figure out the problem. My
wireshark
> > > trace showed two requests from the client to complete the
"showmount -e"
> > > command:
> > > 
> > > 1. Client sent "GETPORT" call to port 111 (rpcbind) to
get the port number
> > >   of MOUNT.
> > > 2. Then it sent "EXPORT" call to mountd port (port it
got in response to #1).
> > 
> > Yes, I did it already, and indeed it showed the two requests, so the
portmapper works fine, but it hangs on the second request.
> > Also "rpcinfo -t localhost portmapper" returns successfully,
while "rpcinfo -t localhost nfs" hangs.
> > The output of rpcinfo -p is the following:
> > 
> >     program vers proto   port  service
> >     100000    4   tcp    111  portmapper
> >     100000    3   tcp    111  portmapper
> >     100000    2   tcp    111  portmapper
> >     100000    4   udp    111  portmapper
> >     100000    3   udp    111  portmapper
> >     100000    2   udp    111  portmapper
> >     100024    1   udp  56082  status
> >     100024    1   tcp  41858  status
> >     100003    3   udp   2049  nfs
> >     100003    3   tcp   2049  nfs
> >     100003    4   udp   2049  nfs
> >     100003    4   tcp   2049  nfs
> >     100005    1   udp  45611  mountd
> >     100005    1   tcp  55915  mountd
> >     100005    3   udp  45611  mountd
> >     100005    3   tcp  55915  mountd
> >     100021    4   udp  48775  nlockmgr
> >     100021    4   tcp  51621  nlockmgr
> >     100011    1   udp   4501  rquotad
> >     100011    1   tcp   4501  rquotad
> >     100011    2   udp   4501  rquotad
> >     100011    2   tcp   4501  rquotad
> > 
> > > 
> > > What does "rpcinfo -p <server-ip>" show?
> > > 
> > > Do you have selinux enabled? I am not sure if that is playing any
role
> > > here...
> > 
> > Nope, it's disabled:
> > 
> > # uname -a
> > Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
> > 
> > 
> > Thanks for the help,
> > 
> >     Alessandro
> > 
> > > 
> > > Regards, Malahal.
> > > 
> > > Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it] wrote:
> > >> Hi,
> > >> this was an extract from the old logs, before Soumya's
suggestion of
> > >> changing the rquota port in the conf file. The new logs are
attached
> > >> (ganesha-20150611.log.gz) as well as the gstack of the
ganesha process
> > >> while I was executing the hanging showmount
> > >> (ganesha-20150611.gstack.gz).
> > >> Thanks,
> > >> 
> > >>    Alessandro
> > >> 
> > >> 
> > >> 
> > >>> On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote:
> > >>> Soumya Koduri [skoduri at redhat.com] wrote:
> > >>>> CCin ganesha-devel to get more inputs.
> > >>>> 
> > >>>> In case of ipv6 enabled, only v6 interfaces are used
by NFS-Ganesha.
> > >>> 
> > >>> I am not a network expert but I have seen IPv4 traffic
over IPv6
> > >>> interface while fixing few things before. This may be
normal.
> > >>> 
> > >>>> commit - git show 'd7e8f255' , which got
added in v2.2 has more details.
> > >>>> 
> > >>>>> # netstat -ltaupn | grep 2049
> > >>>>> tcp6       4      0 :::2049                 :::*
> > >>>>> LISTEN      32080/ganesha.nfsd
> > >>>>> tcp6       1      0 x.x.x.2:2049     
x.x.x.2:33285     CLOSE_WAIT
> > >>>>> -
> > >>>>> tcp6       1      0 127.0.0.1:2049         
127.0.0.1:39555
> > >>>>> CLOSE_WAIT  -
> > >>>>> udp6       0      0 :::2049                 :::*
> > >>>>> 32080/ganesha.nfsd
> > >>>> 
> > >>>>>>> I have enabled the full debug already,
but I see nothing special. Before exporting any volume the log shows no error,
even when I do a showmount (the log is attached, ganesha.log.gz). If I do the
same after exporting a volume nfs-ganesha does not even start, complaining for
not being able to bind the IPv6 ruota socket, but in fact there is nothing
listening on IPv6, so it should not happen:
> > >>>>>>> 
> > >>>>>>> tcp6       0      0 :::111               
:::*                    LISTEN      7433/rpcbind
> > >>>>>>> tcp6       0      0 :::2224              
:::*                    LISTEN      9054/ruby
> > >>>>>>> tcp6       0      0 :::22                
:::*                    LISTEN      1248/sshd
> > >>>>>>> udp6       0      0 :::111               
:::*                                7433/rpcbind
> > >>>>>>> udp6       0      0
fe80::8c2:27ff:fef2:123 :::*                                31238/ntpd
> > >>>>>>> udp6       0      0
fe80::230:48ff:fed2:123 :::*                                31238/ntpd
> > >>>>>>> udp6       0      0
fe80::230:48ff:fed2:123 :::*                                31238/ntpd
> > >>>>>>> udp6       0      0
fe80::230:48ff:fed2:123 :::*                                31238/ntpd
> > >>>>>>> udp6       0      0 ::1:123              
:::*                                31238/ntpd
> > >>>>>>> udp6       0      0
fe80::5484:7aff:fef:123 :::*                                31238/ntpd
> > >>>>>>> udp6       0      0 :::123               
:::*                                31238/ntpd
> > >>>>>>> udp6       0      0 :::824               
:::*                                7433/rpcbind
> > >>>>>>> 
> > >>>>>>> The error, as shown in the attached
ganesha-after-export.log.gz logfile, is the following:
> > >>>>>>> 
> > >>>>>>> 
> > >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 :
node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA
tcp6 socket, error 98 (Address already in use)
> > >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 :
node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6
interface. Cannot continue.
> > >>>>>>> 10/06/2015 02:07:48 : epoch 55777fb5 :
node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster
unloaded
> > >>> 
> > >>> The above messages indicate that someone tried to restart
ganesha. But
> > >>> ganesha failed to come up because RQUOTA port (default is
875) is
> > >>> already in use by an old ganesha instance or some other
program holding
> > >>> it. The new instance of ganesha will die, but if you are
using systemd,
> > >>> it will try to restart automatically. We have disabled
systemd auto
> > >>> restart in our environment as it was causing issues for
debugging.
> > >>> 
> > >>> What version of ganesha is this?
> > >>> 
> > >>> Regards, Malahal.
> > > 
> > > 
> > > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Malahal Naineni

2015-Jun-15 16:47 UTC

head link

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1
would be much different. We do run GPFS FSAL only (no VFS_FSAL).

Regards, Malahal.

Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it]
wrote:> Hi,
> any news on this? Did you have the chance to look into that?
> I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1
> and if it was really working, as I also tried on a standalone, clean
> machine, and I see the very same behavior, even without gluster.
> Thanks,
> 
> 	Alessandro
> 
> On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote:
> > Hi,
> > looking at the code and having recompiled adding some more debug, I
> > might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c,
> > fuction nfs_rpc_dequeue_req, the threads enter the while
(!(wqe->flags &
> > Wqe_LFlag_SyncDone)) and never exit from there.
> > I do not know if it's normal or not as I should read better the
code.
> > Cheers,
> > 
> > 	Alessandro
> > 
> > On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote:
> > > Hi Malahal,
> > > 
> > > 
> > > > Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni
<malahal at us.ibm.com> ha scritto:
> > > > 
> > > > The logs indicate that ganesha was started successfully
without any
> > > > exports.  gstack output seemed normal as well -- threads
were waiting to
> > > > serve requests.
> > > 
> > > Yes, no exports as it was the default config before enabling
Ganesha on any gluster volume.
> > > 
> > > > 
> > > > Assuming that you are running "showmount -e" on
the same system, there
> > > > shouldn't be any firewall coming into the picture.
> > > 
> > > Yes it was the case in my last attempt, from the same machine. I
also tried from another machine, but the result was the same. The firewall
(firewalld, as it's a CentOS 7.1) is disabled anyways.
> > > 
> > > > If you are running
> > > > "showmount" from some other system, make sure
there is no firewall
> > > > dropping the packets.
> > > > 
> > > > I think you need tcpdump trace to figure out the problem. My
wireshark
> > > > trace showed two requests from the client to complete the
"showmount -e"
> > > > command:
> > > > 
> > > > 1. Client sent "GETPORT" call to port 111
(rpcbind) to get the port number
> > > >   of MOUNT.
> > > > 2. Then it sent "EXPORT" call to mountd port (port
it got in response to #1).
> > > 
> > > Yes, I did it already, and indeed it showed the two requests, so
the portmapper works fine, but it hangs on the second request.
> > > Also "rpcinfo -t localhost portmapper" returns
successfully, while "rpcinfo -t localhost nfs" hangs.
> > > The output of rpcinfo -p is the following:
> > > 
> > >     program vers proto   port  service
> > >     100000    4   tcp    111  portmapper
> > >     100000    3   tcp    111  portmapper
> > >     100000    2   tcp    111  portmapper
> > >     100000    4   udp    111  portmapper
> > >     100000    3   udp    111  portmapper
> > >     100000    2   udp    111  portmapper
> > >     100024    1   udp  56082  status
> > >     100024    1   tcp  41858  status
> > >     100003    3   udp   2049  nfs
> > >     100003    3   tcp   2049  nfs
> > >     100003    4   udp   2049  nfs
> > >     100003    4   tcp   2049  nfs
> > >     100005    1   udp  45611  mountd
> > >     100005    1   tcp  55915  mountd
> > >     100005    3   udp  45611  mountd
> > >     100005    3   tcp  55915  mountd
> > >     100021    4   udp  48775  nlockmgr
> > >     100021    4   tcp  51621  nlockmgr
> > >     100011    1   udp   4501  rquotad
> > >     100011    1   tcp   4501  rquotad
> > >     100011    2   udp   4501  rquotad
> > >     100011    2   tcp   4501  rquotad
> > > 
> > > > 
> > > > What does "rpcinfo -p <server-ip>" show?
> > > > 
> > > > Do you have selinux enabled? I am not sure if that is
playing any role
> > > > here...
> > > 
> > > Nope, it's disabled:
> > > 
> > > # uname -a
> > > Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> > > 
> > > 
> > > Thanks for the help,
> > > 
> > >     Alessandro
> > > 
> > > > 
> > > > Regards, Malahal.
> > > > 
> > > > Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it]
wrote:
> > > >> Hi,
> > > >> this was an extract from the old logs, before
Soumya's suggestion of
> > > >> changing the rquota port in the conf file. The new logs
are attached
> > > >> (ganesha-20150611.log.gz) as well as the gstack of the
ganesha process
> > > >> while I was executing the hanging showmount
> > > >> (ganesha-20150611.gstack.gz).
> > > >> Thanks,
> > > >> 
> > > >>    Alessandro
> > > >> 
> > > >> 
> > > >> 
> > > >>> On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni
wrote:
> > > >>> Soumya Koduri [skoduri at redhat.com] wrote:
> > > >>>> CCin ganesha-devel to get more inputs.
> > > >>>> 
> > > >>>> In case of ipv6 enabled, only v6 interfaces are
used by NFS-Ganesha.
> > > >>> 
> > > >>> I am not a network expert but I have seen IPv4
traffic over IPv6
> > > >>> interface while fixing few things before. This may
be normal.
> > > >>> 
> > > >>>> commit - git show 'd7e8f255' , which got
added in v2.2 has more details.
> > > >>>> 
> > > >>>>> # netstat -ltaupn | grep 2049
> > > >>>>> tcp6       4      0 :::2049                
:::*
> > > >>>>> LISTEN      32080/ganesha.nfsd
> > > >>>>> tcp6       1      0 x.x.x.2:2049     
x.x.x.2:33285     CLOSE_WAIT
> > > >>>>> -
> > > >>>>> tcp6       1      0 127.0.0.1:2049         
127.0.0.1:39555
> > > >>>>> CLOSE_WAIT  -
> > > >>>>> udp6       0      0 :::2049                
:::*
> > > >>>>> 32080/ganesha.nfsd
> > > >>>> 
> > > >>>>>>> I have enabled the full debug
already, but I see nothing special. Before exporting any volume the log shows no
error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do
the same after exporting a volume nfs-ganesha does not even start, complaining
for not being able to bind the IPv6 ruota socket, but in fact there is nothing
listening on IPv6, so it should not happen:
> > > >>>>>>> 
> > > >>>>>>> tcp6       0      0 :::111          
:::*                    LISTEN      7433/rpcbind
> > > >>>>>>> tcp6       0      0 :::2224         
:::*                    LISTEN      9054/ruby
> > > >>>>>>> tcp6       0      0 :::22           
:::*                    LISTEN      1248/sshd
> > > >>>>>>> udp6       0      0 :::111          
:::*                                7433/rpcbind
> > > >>>>>>> udp6       0      0
fe80::8c2:27ff:fef2:123 :::*                                31238/ntpd
> > > >>>>>>> udp6       0      0
fe80::230:48ff:fed2:123 :::*                                31238/ntpd
> > > >>>>>>> udp6       0      0
fe80::230:48ff:fed2:123 :::*                                31238/ntpd
> > > >>>>>>> udp6       0      0
fe80::230:48ff:fed2:123 :::*                                31238/ntpd
> > > >>>>>>> udp6       0      0 ::1:123         
:::*                                31238/ntpd
> > > >>>>>>> udp6       0      0
fe80::5484:7aff:fef:123 :::*                                31238/ntpd
> > > >>>>>>> udp6       0      0 :::123          
:::*                                31238/ntpd
> > > >>>>>>> udp6       0      0 :::824          
:::*                                7433/rpcbind
> > > >>>>>>> 
> > > >>>>>>> The error, as shown in the attached
ganesha-after-export.log.gz logfile, is the following:
> > > >>>>>>> 
> > > >>>>>>> 
> > > >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5
: node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind
RQUOTA tcp6 socket, error 98 (Address already in use)
> > > >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5
: node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to
V6 interface. Cannot continue.
> > > >>>>>>> 10/06/2015 02:07:48 : epoch 55777fb5
: node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster
unloaded
> > > >>> 
> > > >>> The above messages indicate that someone tried to
restart ganesha. But
> > > >>> ganesha failed to come up because RQUOTA port
(default is 875) is
> > > >>> already in use by an old ganesha instance or some
other program holding
> > > >>> it. The new instance of ganesha will die, but if you
are using systemd,
> > > >>> it will try to restart automatically. We have
disabled systemd auto
> > > >>> restart in our environment as it was causing issues
for debugging.
> > > >>> 
> > > >>> What version of ganesha is this?
> > > >>> 
> > > >>> Regards, Malahal.
> > > > 
> > > > 
> > > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
>

Gluster users - Jun 2015 - [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size