We have a Gluster 3.2.5 environment using NFS mounts, which in general
is stable. However, we've identified an issue where the NFS server goes
out to lunch when we do a large (>200mb) write to one of the mounts.
Unfortunately there is next to nothing in the nfs.log file, other than
it complaining a brick didn't respond in the timeout interval. The NFS
server gets to the point where the only way to recover is to reboot the
box (our gluster nodes mount the volumes using gluster NFS over loopback).
This is the config of the volume which failed this morning - Not sure if
it is a tuning issue, or a bug. If nothing else, is there a way to
improve the debugging of the gluster nfs daemon?
[root at dresproddns02 glusterfs]# gluster volume info svn
Type: Replicate
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhesproddns01:/gluster/svn
Brick2: rhesproddns02:/gluster/svn
Brick3: dresproddns01:/gluster/svn
Brick4: dresproddns02:/gluster/svn
Options Reconfigured:
nfs.rpc-auth-allow: 127.0.0.1
performance.client-io-threads: 1
performance.flush-behind: on
network.ping-timeout: 5
performance.stat-prefetch: on
nfs.disable: off
nfs.register-with-portmap: on
auth.allow: 10.250.53.*,10.252.248.*,169.254.*,127.0.0.1
performance.cache-size: 256Mb
performance.write-behind-window-size: 128Mb
performance.io-cache: on
performance.io-thread-count: 64
performance.quick-read: on