Mark Selby
2016-Mar-23 16:05 UTC
[Gluster-users] Slow NFS Mounts on one node in a replicated cluster
I have two servers running Ubuntu 14.04 and Gluster 3.7.6 Each volume is a made up of two bricks replicated between the two servers. I have this very strange situation whereby NFS mounts against ONE of the servers incurs a 5 second wait during the mount phase. NFS performance is fine after the mount. FUSE mounts are fine on both server. This slowness in NFS mounting occurs across ALL volumes on only one of the server There are some messages in nfs.log on the server that has the slow mounts that are not on the server where the mounts are instant. I have done some Google searching but nothing definitive comes up. Has anyone ever seen this? Any pointers on where else to look or how to further debug? I see that the glusterfs process is responsible for mounting and am not sure how to ask it to display debug info. I have done a tcpdump and see nothing other than the repsonse to the mount request takes 5 seconds to complete on one node in the set. root at dc1strg001 /root 632# rpcinfo -p program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100005 3 tcp 38465 mountd 100005 1 tcp 38466 mountd 100003 3 tcp 2049 nfs 100021 4 tcp 38468 nlockmgr 100024 1 udp 32765 status 100024 1 tcp 32765 status 100227 3 tcp 2049 100021 1 udp 643 nlockmgr 100021 1 tcp 645 nlockmgr root at dc1strg001 /root 635# gluster volume status home Status of volume: home Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dc1strg001:/zfspool/glusterfs/home/da ta 49159 0 Y 32185 Brick dc1strg002:/zfspool/glusterfs/home/da ta 49159 0 Y 31317 NFS Server on localhost 2049 0 Y 466 Self-heal Daemon on localhost N/A N/A Y 482 NFS Server on dc1strg002 2049 0 Y 31950 Self-heal Daemon on dc1strg002 N/A N/A Y 31958 Task Status of Volume home ------------------------------------------------------------------------------ There are no active volume tasks root at dc1strg001 /var/log/glusterfs 646# gluster volume get home all | grep nfs performance.nfs.flush-behind on performance.nfs.write-behind-window-size1MB performance.nfs.strict-o-direct off performance.nfs.strict-write-ordering off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable false nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.exports-auth-enable (null) nfs.auth-refresh-interval-sec (null) nfs.auth-cache-ttl-sec (null) [2016-03-23 15:29:48.640220] W [rpcsvc.c:278:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 2) for 10.123.14.14:42552 [2016-03-23 15:29:48.640293] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2016-03-23 15:30:28.488880] I [socket.c:3382:socket_submit_reply] 0-socket.nfs-server: not connected (priv->connected = -1) [2016-03-23 15:30:28.488953] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x18363d6c, Program: MOUNT3, ProgVers: 3, Proc: 3) to rpc-transport (socket.nfs-server)