thr3ads.net - Gluster users - [Gluster-users] NFS failover with Gluster NFS [Jun 2014]

If this information is useful, please help other people find it:
Share via:

John Malconian

2014-Jun-12 22:41 UTC

[Gluster-users] NFS failover with Gluster NFS

Hi folks,

I've set up a simple, two node, replicated gluster 3.5 system on CentOS 6.
For performance reasons, I'm using NFS (debian wheezy on the client-side)
and gluster NFS (on the server side). I've also implemented a simple
Heartbeat config on the two gluster systems to do IP failover in the event
I lose a gluster node or want to take a node offline for maintenance.

My issue is that NFS failover takes too long - approximately 15-20 minutes.
Using tcpdump I'm able to confirm that the IP failover and client
redirect to the failover host takes place in a matter of seconds so I don't
believe anything is wrong down at the layer 2/3 level.

I see the NFS client (10.107.98.211) repeatedly connecting to the failover
node using tcpdump but no return packets from the server (10.107.98.222) to
the client:

$ sudo tcpdump -i eth0 -n host 10.107.98.211

21:09:28.354879 IP 10.107.98.211.2651514551 > 10.107.98.222.2049: 108
getattr fh
Unknown/3A4F474CDDA8283709DB42EBBC8D2B66B649D958377AE2AA604F4859B34149E7

21:09:41.442600 IP 10.107.98.211.2651514551 > 10.107.98.222.2049: 108
getattr fh
Unknown/3A4F474CDDA8283709DB42EBBC8D266B649D958377AE2AA604F4859B34149E7

21:10:07.650088 IP 10.107.98.211.2651514551 > 10.107.98.222.2049: 108
getattr fh
Unknown/3A4F474CDDA8283709DB42EBBC8D266B649D958377AE2AA604F4859B34149E7

...

Eventually, after 10-15 minutes a response is sent back to the client and
NFS works properly again:

21:24:46.352946 IP 10.107.98.222.nfs > 10.107.98.211.892: Flags [S.], seq
1884793786, ack 2118543465, win 14480, options [mss 1460,sackOK,TS val
181785906 ecr 28589184,nop,wscale 7], length 0

Has anyone been successful in implementing something like this using
Gluster NFS? I'm not certain if this is an NFS issue (maybe a stale file
handle issue) or maybe something related to running NFS using TCP. Or
perhaps something else all together. There doesn't seem to be any
additional clues either in the client logs or the gluster NFS log.
My NFS client mount options used:

rw,noatime,nodiratime,vers=3,rsize=1048576,wsize=1048576,namlen=255,timeo=600,retrans=2,sec=sys,mountvers=3,mountproto=tcp,local_lock=none

Thanks all for your time.

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140612/1bee2bde/attachment.html>

Eliezer Croitoru

2014-Jun-12 23:02 UTC

head link

[Gluster-users] NFS failover with Gluster NFS

> My NFS client mount options used:
>
>
rw,noatime,nodiratime,vers=3,rsize=1048576,wsize=1048576,namlen=255,timeo=600,retrans=2,sec=sys,mountvers=3,mountproto=tcp,local_lock=none
>How do you verify the issue?
if the IP is being taken then the only problems you have are:
ARP
NFS connections which are not aborted
OTHERS unkown side-effects

Try to first use "hard" option of the nfs mount so in cases which the 
error accrues in-transit of a file it will recover automatically.
and timeo should be less then 600 since 600 means 10 minutes.
You should use something like max 10.

Eliezer

Gluster users - Jun 2014 - NFS failover with Gluster NFS

[Gluster-users] NFS failover with Gluster NFS

[Gluster-users] NFS failover with Gluster NFS