Gerald Brandt
2012-Dec-03 01:32 UTC
[Gluster-users] Server temporarily failed, causing NFS client disconnect.
Hi, I have a 2 server replicate. The main server is running 3.3.0 and the second server is running 3.3.1. The clients connect with NFS, using the Gluster NFS server (to 3.3.0). Today, by 3.3.1 server went down. I don't know why yet, I'll figure that out tomorrow. When the 3.3.1 server went down, the 3.3.0 server (currently connected NFS server) log files filled up with: [2012-12-02 11:20:48.548129] C [client-handshake.c:126:rpc_client_ping_timer_expired] 0-NFS_RAID1_FO-client-0: server 192.168.10.2:24011 has not responded in the last 42 seconds, disconnecting. [2012-12-02 11:20:48.634077] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xd0) [0x7f6672c145b0] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f6672c14220] (-->/usr/lib/libgfrpc.so. 0(saved_frames_destroy+0xe) [0x7f6672c1414e]))) 0-NFS_RAID1_FO-client-0: forced unwinding frame type(GlusterFS 3.1) op(FINODELK(30)) called at 2012-12-02 11:07:13.201686 (xid=0x28455049x) [2012-12-02 11:20:48.634115] W [client3_1-fops.c:1545:client3_1_finodelk_cbk] 0-NFS_RAID1_FO-client-0: remote operation failed: Transport endpoint is not connected [2012-12-02 11:20:48.634188] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xd0) [0x7f6672c145b0] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f6672c14220] (-->/usr/lib/libgfrpc.so. 0(saved_frames_destroy+0xe) [0x7f6672c1414e]))) 0-NFS_RAID1_FO-client-0: forced unwinding frame type(GlusterFS 3.1) op(FINODELK(30)) called at 2012-12-02 11:07:13.201754 (xid=0x28455050x) [2012-12-02 11:20:48.650242] W [client3_1-fops.c:1545:client3_1_finodelk_cbk] 0-NFS_RAID1_FO-client-0: remote operation failed: Transport endpoint is not connected [2012-12-02 11:20:48.657921] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xd0) [0x7f6672c145b0] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f6672c14220] (-->/usr/lib/libgfrpc.so. 0(saved_frames_destroy+0xe) [0x7f6672c1414e]))) 0-NFS_RAID1_FO-client-0: forced unwinding frame type(GlusterFS 3.1) op(FINODELK(30)) called at 2012-12-02 11:07:13.417580 (xid=0x28455051x) [2012-12-02 11:20:48.657955] W [client3_1-fops.c:1545:client3_1_finodelk_cbk] 0-NFS_RAID1_FO-client-0: remote operation failed: Transport endpoint is not connected [2012-12-02 11:20:48.658003] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0xd0) [0x7f6672c145b0] (-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f6672c14220] (-->/usr/lib/libgfrpc.so. 0 and (apparently) became non-responsive over NFS long enough for the clients to disconnect. The clients are all XenServers, and gluster holds the VM images. So what essentially happened is all my VM's locked up, the Linux servers eventually remount the filesystems RO. Has any one seen this before? Gerald