Hi folks,
I've set up a simple, two node, replicated gluster 3.5 system on CentOS 6.
For performance reasons, I'm using NFS (debian wheezy on the client-side)
and gluster NFS (on the server side). I've also implemented a simple
Heartbeat config on the two gluster systems to do IP failover in the event
I lose a gluster node or want to take a node offline for maintenance.
My issue is that NFS failover takes too long - approximately 15-20 minutes.
Using tcpdump I'm able to confirm that the IP failover and client
redirect to the failover host takes place in a matter of seconds so I don't
believe anything is wrong down at the layer 2/3 level.
I see the NFS client (10.107.98.211) repeatedly connecting to the failover
node using tcpdump but no return packets from the server (10.107.98.222) to
the client:
$ sudo tcpdump -i eth0 -n host 10.107.98.211
21:09:28.354879 IP 10.107.98.211.2651514551 > 10.107.98.222.2049: 108
getattr fh
Unknown/3A4F474CDDA8283709DB42EBBC8D2B66B649D958377AE2AA604F4859B34149E7
21:09:41.442600 IP 10.107.98.211.2651514551 > 10.107.98.222.2049: 108
getattr fh
Unknown/3A4F474CDDA8283709DB42EBBC8D266B649D958377AE2AA604F4859B34149E7
21:10:07.650088 IP 10.107.98.211.2651514551 > 10.107.98.222.2049: 108
getattr fh
Unknown/3A4F474CDDA8283709DB42EBBC8D266B649D958377AE2AA604F4859B34149E7
...
Eventually, after 10-15 minutes a response is sent back to the client and
NFS works properly again:
21:24:46.352946 IP 10.107.98.222.nfs > 10.107.98.211.892: Flags [S.], seq
1884793786, ack 2118543465, win 14480, options [mss 1460,sackOK,TS val
181785906 ecr 28589184,nop,wscale 7], length 0
Has anyone been successful in implementing something like this using
Gluster NFS? I'm not certain if this is an NFS issue (maybe a stale file
handle issue) or maybe something related to running NFS using TCP. Or
perhaps something else all together. There doesn't seem to be any
additional clues either in the client logs or the gluster NFS log.
My NFS client mount options used:
rw,noatime,nodiratime,vers=3,rsize=1048576,wsize=1048576,namlen=255,timeo=600,retrans=2,sec=sys,mountvers=3,mountproto=tcp,local_lock=none
Thanks all for your time.
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140612/1bee2bde/attachment.html>