Hello, I have a cluster with 2 nodes that I want to access using NFS, there is no firewall between the server. I'm using gluster 3.9.0 The volumes mount correctly on a client machine, even if I get a strange "portmap query retrying: RPC: Program not registered" [root at nf1 ~]# mount -vvv -t nfs -o nfsvers=3,rw gluster-nfs-a:/replicated-data /export/replicated-data mount: fstab path: "/etc/fstab" ......... mount: external mount: argv[5] = "rw,nfsvers=3" mount.nfs: timeout set for Thu Jan 19 21:51:05 2017 mount.nfs: trying text-based options 'nfsvers=3,addr=10.1.0.1' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: trying 10.4.0.28 prog 100003 vers 3 prot TCP port 2049 ....... mount.nfs: prog 100005, trying vers=3, prot=17 mount.nfs: portmap query retrying: RPC: Program not registered ....... mount.nfs: prog 100005, trying vers=3, prot=6 mount.nfs: trying 10.4.0.28 prog 100005 vers 3 prot TCP port 38465 gluster-nfs-a:/replicated-data on /export/replicated-data type nfs (rw,nfsvers=3) When I try to lock a file on the client with something like: flock -x -w 10 100 file (...) the lock doesn't work and I get those line on the server nfs.log: [2017-01-19 20:03:03.621556] I [rpc-clnt.c:1946:rpc_clnt_reconfig] 0-replicated-data-client-2: changing port to 49159 (from 0) [2017-01-19 20:03:03.626031] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-replicated-data-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-01-19 20:03:03.626661] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-0: Connected to replicated-data-client-0, attached to remote volume '/gluster-nfs-zpool/replicated-data'. [2017-01-19 20:03:03.626693] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-01-19 20:03:03.626803] I [MSGID: 108005] [afr-common.c:4429:afr_notify] 0-replicated-data-replicate-0: Subvolume 'replicated-data-client-0' came back up; going online. [2017-01-19 20:03:03.626998] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-replicated-data-client-0: Server lk version = 1 [2017-01-19 20:03:03.630498] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-replicated-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-01-19 20:03:03.631152] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2: Connected to replicated-data-client-2, attached to remote volume '/gluster-nfs-zpool/replicated-data'. [2017-01-19 20:03:03.631180] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2: Server and Client lk-version numbers are not same, reopening the fds [2017-01-19 20:03:03.631651] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-replicated-data-client-2: Server lk version = 1 [2017-01-19 20:03:03.632563] I [MSGID: 108031] [afr-common.c:2178:afr_local_discovery_cbk] 0-replicated-data-replicate-0: selecting local read_child replicated-data-client-0 [2017-01-19 20:03:29.463006] W [MSGID: 112165] [nlm4.c:1484:nlm4svc_lock_common] 0-nfs-NLM: NLM in grace period [2017-01-19 20:03:32.742975] W [MSGID: 112165] [nlm4.c:1804:nlm4svc_unlock] 0-nfs-NLM: NLM in grace period The message "W [MSGID: 112165] [nlm4.c:1804:nlm4svc_unlock] 0-nfs-NLM: NLM in grace period" repeated 4 times between [2017-01-19 20:03:32.742975] and [2017-01-19 20:03:52.742771] [2017-01-19 20:03:57.743581] W [MSGID: 112057] [nlm4.c:1744:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL [No locks available] [2017-01-19 20:03:57.743652] W [MSGID: 112122] [nlm4.c:1759:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume [Operation not permitted] [2017-01-19 20:05:03.237220] E [MSGID: 112163] [nlm4.c:570:nsm_monitor] 0-nfs-NLM: clnt_call(): RPC: Success [2017-01-19 20:06:10.566790] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-replicated-data-client-2: server 10.1.0.33:49159 has not responded in the last 42 seconds, disconnecting. [2017-01-19 20:06:13.380796] I [socket.c:3396:socket_submit_request] 0-replicated-data-client-2: not connected (priv->connected = -1) [2017-01-19 20:06:13.380830] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x29 Program: GlusterFS 3.3, ProgVers: 330, Proc: 26) to rpc-transport (replicated-data-client-2) [2017-01-19 20:06:13.380846] W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected] [2017-01-19 20:06:45.739806] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2a Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2) [2017-01-19 20:07:45.651815] W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected] [2017-01-19 20:07:47.535963] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2d Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2) [2017-01-19 20:08:45.764782] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2e Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2) [2017-01-19 20:08:47.553021] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2f Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2) The message "W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]" repeated 3 times between [2017-01-19 20:07:45.651815] and [2017-01-19 20:08:47.553057] [2017-01-19 20:09:25.952509] E [MSGID: 112167] [nlm4.c:1009:nlm4_establish_callback] 0-nfs-NLM: Unable to get NLM port of the client. Is the firewall running on client? OR Are RPC services running (rpcinfo -p)? [2017-01-19 20:09:25.953149] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(LK(26)) called at 2017-01-19 20:05:13.206864 (xid=0x23) [2017-01-19 20:09:25.953181] W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected] [2017-01-19 20:09:25.953845] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(LK(26)) called at 2017-01-19 20:05:23.235063 (xid=0x24) [2017-01-19 20:09:25.954461] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(LK(26)) called at 2017-01-19 20:05:43.299167 (xid=0x26) [2017-01-19 20:09:25.955045] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2017-01-19 20:05:28.562661 (xid=0x25) [2017-01-19 20:09:25.955077] W [rpc-clnt-ping.c:202:rpc_clnt_ping_cbk] 0-replicated-data-client-2: socket disconnected [2017-01-19 20:09:25.955567] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(STATFS(14)) called at 2017-01-19 20:05:45.774191 (xid=0x27) [2017-01-19 20:09:25.955612] W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected] [2017-01-19 20:09:25.956246] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f29941a16fb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2993f6879e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2993f688ae] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f2993f6a004] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x110)[0x7f2993f6a8d0] ))))) 0-replicated-data-client-2: forced unwinding frame type(GlusterFS 3.3) op(STATFS(14)) called at 2017-01-19 20:05:47.549472 (xid=0x28) [2017-01-19 20:09:25.956373] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-replicated-data-client-2: disconnected from replicated-data-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-19 20:09:36.592738] I [rpc-clnt.c:1946:rpc_clnt_reconfig] 0-replicated-data-client-2: changing port to 49159 (from 0) [2017-01-19 20:09:36.597437] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-replicated-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-01-19 20:09:36.598036] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2: Connected to replicated-data-client-2, attached to remote volume '/gluster-nfs-zpool/replicated-data'. [2017-01-19 20:09:36.598059] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2: Server and Client lk-version numbers are not same, reopening the fds [2017-01-19 20:09:36.598072] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 0-replicated-data-client-2: 1 fds open - Delaying child_up until they are re-opened [2017-01-19 20:09:36.598343] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 0-replicated-data-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2017-01-19 20:09:36.598036] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-replicated-data-client-2: Connected to replicated-data-client-2, attached to remote volume '/gluster-nfs-zpool/replicated-data'. [2017-01-19 20:09:36.598059] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-replicated-data-client-2: Server and Client lk-version numbers are not same, reopening the fds [2017-01-19 20:09:36.598072] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 0-replicated-data-client-2: 1 fds open - Delaying child_up until they are re-opened [2017-01-19 20:09:36.598343] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 0-replicated-data-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2017-01-19 20:09:36.598506] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-replicated-data-client-2: Server lk version = 1 [2017-01-19 20:10:47.538183] I [MSGID: 108031] [afr-common.c:2178:afr_local_discovery_cbk] 0-replicated-data-replicate-0: selecting local read_child replicated-data-client-0 The message "W [MSGID: 114031] [client-rpc-fops.c:2529:client3_3_lk_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]" repeated 2 times between [2017-01-19 20:09:25.953181] and [2017-01-19 20:09:25.954495] [2017-01-19 20:09:25.956278] W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected] [2017-01-19 20:14:18.649616] W [socket.c:588:__socket_rwv] 0-NLM-client: readv on 10.4.0.13:32803 failed (No data available) [2017-01-19 20:26:18.648786] I [MSGID: 101053] [mem-pool.c:638:mem_pool_destroy] 0-nfs-server: size=588 max=1 total=3 [2017-01-19 20:26:18.648831] I [MSGID: 101053] [mem-pool.c:638:mem_pool_destroy] 0-nfs-server: size=124 max=1 total=3 As I said there is no firewall and all the ports listed here are open and reachable from the client on both replica servers: program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100005 3 tcp 38465 mountd 100005 1 tcp 38466 mountd 100003 3 tcp 2049 nfs 100024 1 udp 53548 status 100024 1 tcp 35281 status 100021 4 tcp 38468 nlockmgr 100021 1 udp 914 nlockmgr 100227 3 tcp 2049 nfs_acl 100021 1 tcp 917 nlockmgr Also I don't understand this: [2017-01-19 20:03:57.743581] W [MSGID: 112057] [nlm4.c:1744:nlm4_unlock_resume] 0-nfs-NLM: nlm_get_uniq() returned NULL [No locks available] [2017-01-19 20:03:57.743652] W [MSGID: 112122] [nlm4.c:1759:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume [Operation not permitted] [2017-01-19 20:05:03.237220] E [MSGID: 112163] [nlm4.c:570:nsm_monitor] 0-nfs-NLM: clnt_call(): RPC: Success [2017-01-19 20:06:10.566790] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-replicated-data-client-2: server 10.1.0.33:49159 has not responded in the last 42 seconds, disconnecting. Why the server I'm connected is trying to connected to the replica: 0-replicated-data-client-2: server 10.4.0.33:49159 ? And this: [2017-01-19 20:08:47.553021] W [rpc-clnt.c:1639:rpc_clnt_submit] 0-replicated-data-client-2: failed to submit rpc-request (XID: 0x2f Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (replicated-data-client-2) The message "W [MSGID: 114031] [client-rpc-fops.c:799:client3_3_statfs_cbk] 0-replicated-data-client-2: remote operation failed [Transport endpoint is not connected]" repeated 3 times between [2017-01-19 20:07:45.651815] and [2017-01-19 20:08:47.553057] [2017-01-19 20:09:25.952509] E [MSGID: 112167] [nlm4.c:1009:nlm4_establish_callback] 0-nfs-NLM: Unable to get NLM port of the client. Is the firewall running on client? OR Are RPC services running (rpcinfo -p)? Is here talking about the client trying to lock a file in the volume, or actually the other replica server on 10.1.0.33 ? Any idea? Thanks, C -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170119/f1849a91/attachment.html>