Alessandro De Salvo
2015-Jun-12 07:35 UTC
[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi Malahal,> Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni <malahal at us.ibm.com> ha scritto: > > The logs indicate that ganesha was started successfully without any > exports. gstack output seemed normal as well -- threads were waiting to > serve requests.Yes, no exports as it was the default config before enabling Ganesha on any gluster volume.> > Assuming that you are running "showmount -e" on the same system, there > shouldn't be any firewall coming into the picture.Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways.> If you are running > "showmount" from some other system, make sure there is no firewall > dropping the packets. > > I think you need tcpdump trace to figure out the problem. My wireshark > trace showed two requests from the client to complete the "showmount -e" > command: > > 1. Client sent "GETPORT" call to port 111 (rpcbind) to get the port number > of MOUNT. > 2. Then it sent "EXPORT" call to mountd port (port it got in response to #1).Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also "rpcinfo -t localhost portmapper" returns successfully, while "rpcinfo -t localhost nfs" hangs. The output of rpcinfo -p is the following: program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 56082 status 100024 1 tcp 41858 status 100003 3 udp 2049 nfs 100003 3 tcp 2049 nfs 100003 4 udp 2049 nfs 100003 4 tcp 2049 nfs 100005 1 udp 45611 mountd 100005 1 tcp 55915 mountd 100005 3 udp 45611 mountd 100005 3 tcp 55915 mountd 100021 4 udp 48775 nlockmgr 100021 4 tcp 51621 nlockmgr 100011 1 udp 4501 rquotad 100011 1 tcp 4501 rquotad 100011 2 udp 4501 rquotad 100011 2 tcp 4501 rquotad> > What does "rpcinfo -p <server-ip>" show? > > Do you have selinux enabled? I am not sure if that is playing any role > here...Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro> > Regards, Malahal. > > Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it] wrote: >> Hi, >> this was an extract from the old logs, before Soumya's suggestion of >> changing the rquota port in the conf file. The new logs are attached >> (ganesha-20150611.log.gz) as well as the gstack of the ganesha process >> while I was executing the hanging showmount >> (ganesha-20150611.gstack.gz). >> Thanks, >> >> Alessandro >> >> >> >>> On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: >>> Soumya Koduri [skoduri at redhat.com] wrote: >>>> CCin ganesha-devel to get more inputs. >>>> >>>> In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. >>> >>> I am not a network expert but I have seen IPv4 traffic over IPv6 >>> interface while fixing few things before. This may be normal. >>> >>>> commit - git show 'd7e8f255' , which got added in v2.2 has more details. >>>> >>>>> # netstat -ltaupn | grep 2049 >>>>> tcp6 4 0 :::2049 :::* >>>>> LISTEN 32080/ganesha.nfsd >>>>> tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT >>>>> - >>>>> tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 >>>>> CLOSE_WAIT - >>>>> udp6 0 0 :::2049 :::* >>>>> 32080/ganesha.nfsd >>>> >>>>>>> I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: >>>>>>> >>>>>>> tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind >>>>>>> tcp6 0 0 :::2224 :::* LISTEN 9054/ruby >>>>>>> tcp6 0 0 :::22 :::* LISTEN 1248/sshd >>>>>>> udp6 0 0 :::111 :::* 7433/rpcbind >>>>>>> udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd >>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd >>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd >>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd >>>>>>> udp6 0 0 ::1:123 :::* 31238/ntpd >>>>>>> udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd >>>>>>> udp6 0 0 :::123 :::* 31238/ntpd >>>>>>> udp6 0 0 :::824 :::* 7433/rpcbind >>>>>>> >>>>>>> The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: >>>>>>> >>>>>>> >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. >>>>>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded >>> >>> The above messages indicate that someone tried to restart ganesha. But >>> ganesha failed to come up because RQUOTA port (default is 875) is >>> already in use by an old ganesha instance or some other program holding >>> it. The new instance of ganesha will die, but if you are using systemd, >>> it will try to restart automatically. We have disabled systemd auto >>> restart in our environment as it was causing issues for debugging. >>> >>> What version of ganesha is this? >>> >>> Regards, Malahal. > > >
Alessandro De Salvo
2015-Jun-12 12:34 UTC
[Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe->flags & Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote:> Hi Malahal, > > > > Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni <malahal at us.ibm.com> ha scritto: > > > > The logs indicate that ganesha was started successfully without any > > exports. gstack output seemed normal as well -- threads were waiting to > > serve requests. > > Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. > > > > > Assuming that you are running "showmount -e" on the same system, there > > shouldn't be any firewall coming into the picture. > > Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. > > > If you are running > > "showmount" from some other system, make sure there is no firewall > > dropping the packets. > > > > I think you need tcpdump trace to figure out the problem. My wireshark > > trace showed two requests from the client to complete the "showmount -e" > > command: > > > > 1. Client sent "GETPORT" call to port 111 (rpcbind) to get the port number > > of MOUNT. > > 2. Then it sent "EXPORT" call to mountd port (port it got in response to #1). > > Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. > Also "rpcinfo -t localhost portmapper" returns successfully, while "rpcinfo -t localhost nfs" hangs. > The output of rpcinfo -p is the following: > > program vers proto port service > 100000 4 tcp 111 portmapper > 100000 3 tcp 111 portmapper > 100000 2 tcp 111 portmapper > 100000 4 udp 111 portmapper > 100000 3 udp 111 portmapper > 100000 2 udp 111 portmapper > 100024 1 udp 56082 status > 100024 1 tcp 41858 status > 100003 3 udp 2049 nfs > 100003 3 tcp 2049 nfs > 100003 4 udp 2049 nfs > 100003 4 tcp 2049 nfs > 100005 1 udp 45611 mountd > 100005 1 tcp 55915 mountd > 100005 3 udp 45611 mountd > 100005 3 tcp 55915 mountd > 100021 4 udp 48775 nlockmgr > 100021 4 tcp 51621 nlockmgr > 100011 1 udp 4501 rquotad > 100011 1 tcp 4501 rquotad > 100011 2 udp 4501 rquotad > 100011 2 tcp 4501 rquotad > > > > > What does "rpcinfo -p <server-ip>" show? > > > > Do you have selinux enabled? I am not sure if that is playing any role > > here... > > Nope, it's disabled: > > # uname -a > Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > > > Thanks for the help, > > Alessandro > > > > > Regards, Malahal. > > > > Alessandro De Salvo [Alessandro.DeSalvo at roma1.infn.it] wrote: > >> Hi, > >> this was an extract from the old logs, before Soumya's suggestion of > >> changing the rquota port in the conf file. The new logs are attached > >> (ganesha-20150611.log.gz) as well as the gstack of the ganesha process > >> while I was executing the hanging showmount > >> (ganesha-20150611.gstack.gz). > >> Thanks, > >> > >> Alessandro > >> > >> > >> > >>> On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: > >>> Soumya Koduri [skoduri at redhat.com] wrote: > >>>> CCin ganesha-devel to get more inputs. > >>>> > >>>> In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. > >>> > >>> I am not a network expert but I have seen IPv4 traffic over IPv6 > >>> interface while fixing few things before. This may be normal. > >>> > >>>> commit - git show 'd7e8f255' , which got added in v2.2 has more details. > >>>> > >>>>> # netstat -ltaupn | grep 2049 > >>>>> tcp6 4 0 :::2049 :::* > >>>>> LISTEN 32080/ganesha.nfsd > >>>>> tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT > >>>>> - > >>>>> tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 > >>>>> CLOSE_WAIT - > >>>>> udp6 0 0 :::2049 :::* > >>>>> 32080/ganesha.nfsd > >>>> > >>>>>>> I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: > >>>>>>> > >>>>>>> tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind > >>>>>>> tcp6 0 0 :::2224 :::* LISTEN 9054/ruby > >>>>>>> tcp6 0 0 :::22 :::* LISTEN 1248/sshd > >>>>>>> udp6 0 0 :::111 :::* 7433/rpcbind > >>>>>>> udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd > >>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd > >>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd > >>>>>>> udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd > >>>>>>> udp6 0 0 ::1:123 :::* 31238/ntpd > >>>>>>> udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd > >>>>>>> udp6 0 0 :::123 :::* 31238/ntpd > >>>>>>> udp6 0 0 :::824 :::* 7433/rpcbind > >>>>>>> > >>>>>>> The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: > >>>>>>> > >>>>>>> > >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) > >>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. > >>>>>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded > >>> > >>> The above messages indicate that someone tried to restart ganesha. But > >>> ganesha failed to come up because RQUOTA port (default is 875) is > >>> already in use by an old ganesha instance or some other program holding > >>> it. The new instance of ganesha will die, but if you are using systemd, > >>> it will try to restart automatically. We have disabled systemd auto > >>> restart in our environment as it was causing issues for debugging. > >>> > >>> What version of ganesha is this? > >>> > >>> Regards, Malahal. > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users