thr3ads.net - Gluster users - [Gluster-users] Slow NFS Mounts on one node in a replicated cluster [Mar 2016]

If this information is useful, please help other people find it:
Share via:
Mark Selby
2016-Mar-23 16:05 UTC
[Gluster-users] Slow NFS Mounts on one node in a replicated cluster

I have two servers running Ubuntu 14.04 and Gluster 3.7.6

Each volume is a made up of two bricks replicated between the two servers.

I have this very strange situation whereby NFS mounts against ONE of the 
servers incurs a 5 second wait during the mount phase. NFS performance 
is fine after the mount.

FUSE mounts are fine on both server.

This slowness in NFS mounting occurs across ALL volumes on only one of 
the server

There are some messages in nfs.log on the server that has the slow 
mounts that are not on the server where the mounts are instant.

I have done some Google searching but nothing definitive comes up.

Has anyone ever seen this?

Any pointers on where else to look or how to further debug?

I see that the glusterfs process is responsible for mounting and am not 
sure how to ask it to display debug info.

I have done a tcpdump and see nothing other than the repsonse to the 
mount request takes 5 seconds to complete on one node in the set.

root at dc1strg001 /root 632# rpcinfo -p
    program vers proto   port  service
     100000    4   tcp    111  portmapper
     100000    3   tcp    111  portmapper
     100000    2   tcp    111  portmapper
     100000    4   udp    111  portmapper
     100000    3   udp    111  portmapper
     100000    2   udp    111  portmapper
     100005    3   tcp  38465  mountd
     100005    1   tcp  38466  mountd
     100003    3   tcp   2049  nfs
     100021    4   tcp  38468  nlockmgr
     100024    1   udp  32765  status
     100024    1   tcp  32765  status
     100227    3   tcp   2049
     100021    1   udp    643  nlockmgr
     100021    1   tcp    645  nlockmgr

root at dc1strg001 /root 635# gluster volume status home
Status of volume: home
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dc1strg001:/zfspool/glusterfs/home/da
ta                                          49159 0          Y       32185
Brick dc1strg002:/zfspool/glusterfs/home/da
ta                                          49159 0          Y       31317
NFS Server on localhost                     2049 0          Y       466
Self-heal Daemon on localhost               N/A N/A        Y       482
NFS Server on dc1strg002                    2049 0          Y       31950
Self-heal Daemon on dc1strg002              N/A N/A        Y       31958

Task Status of Volume home
------------------------------------------------------------------------------
There are no active volume tasks

root at dc1strg001 /var/log/glusterfs 646# gluster volume get home all | 
grep nfs
performance.nfs.flush-behind on
performance.nfs.write-behind-window-size1MB
performance.nfs.strict-o-direct off
performance.nfs.strict-write-ordering off
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
nfs.enable-ino32 no
nfs.mem-factor 15
nfs.export-dirs on
nfs.export-volumes on
nfs.addr-namelookup off
nfs.dynamic-volumes off
nfs.register-with-portmap on
nfs.outstanding-rpc-limit 16
nfs.port 2049
nfs.rpc-auth-unix on
nfs.rpc-auth-null on
nfs.rpc-auth-allow all
nfs.rpc-auth-reject none
nfs.ports-insecure off
nfs.trusted-sync off
nfs.trusted-write off
nfs.volume-access read-write
nfs.export-dir
nfs.disable false
nfs.nlm on
nfs.acl on
nfs.mount-udp off
nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd /sbin/rpc.statd
nfs.server-aux-gids off
nfs.drc off
nfs.drc-size 0x20000
nfs.read-size                           (1 * 1048576ULL)
nfs.write-size                          (1 * 1048576ULL)
nfs.readdir-size                        (1 * 1048576ULL)
nfs.exports-auth-enable (null)
nfs.auth-refresh-interval-sec (null)
nfs.auth-cache-ttl-sec (null)

[2016-03-23 15:29:48.640220] W [rpcsvc.c:278:rpcsvc_program_actor] 
0-rpc-service: RPC program version not available (req 100003 2) for 
10.123.14.14:42552
[2016-03-23 15:29:48.640293] E 
[rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed 
to complete successfully
[2016-03-23 15:30:28.488880] I [socket.c:3382:socket_submit_reply] 
0-socket.nfs-server: not connected (priv->connected = -1)
[2016-03-23 15:30:28.488953] E [rpcsvc.c:1314:rpcsvc_submit_generic] 
0-rpc-service: failed to submit message (XID: 0x18363d6c, Program: 
MOUNT3, ProgVers: 3, Proc: 3) to rpc-transport (socket.nfs-server)
Gluster users - Mar 2016 - Slow NFS Mounts on one node in a replicated cluster

[Gluster-users] Slow NFS Mounts on one node in a replicated cluster