I have been having issues with Gluster for the past couple of weeks on my scratch server (mostly stability). Today, Gluster keeps crashing and will only stay up for a few seconds at a time. The nfs log file contains these messages: READDIRP(40)) called at 2014-05-20 14:56:58.933291 [2014-05-20 15:01:51.568960] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.568994] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x7f193f7b0729] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f193f7afeee] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f193f7afe5e]))) 0-hpcscratch-client-0: forced unwinding frame type(GlusterFS 3.1) op(READDIRP(40)) called at 2014-05-20 14:56:58.933348 [2014-05-20 15:01:51.569007] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569043] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x7f193f7b0729] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f193f7afeee] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f193f7afe5e]))) 0-hpcscratch-client-0: forced unwinding frame type(GlusterFS 3.1) op(READDIRP(40)) called at 2014-05-20 14:56:58.933405 [2014-05-20 15:01:51.569057] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569484] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x7f193f7b0729] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f193f7afeee] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f193f7afe5e]))) 0-hpcscratch-client-0: forced unwinding frame type(GlusterFS 3.1) op(READDIRP(40)) called at 2014-05-20 14:56:58.933484 [2014-05-20 15:01:51.569487] W [rpc-clnt.c:1417:rpc_clnt_submit] 0-hpcscratch-client-0: failed to submit rpc-request (XID: 0x289032x Program: GlusterFS 3.1, ProgVers: 310, Proc: 20) to rpc-transport (hpcscratch-client-0) [2014-05-20 15:01:51.569513] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569538] E [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569570] W [glusterfsd.c:727:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x30e3ce5ccd] (-->/lib64/libpthread.so.0() [0x30e40077f1] (-->/opt/glusterfs/3.2.6/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40477c]))) 0-: received signum (15), shutting down [2014-05-20 15:01:51.569585] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x7f193f7b0729] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f193f7afeee] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f193f7afe5e]))) 0-hpcscratch-client-0: forced unwinding frame type(GlusterFS 3.1) op(READDIRP(40)) called at 2014-05-20 14:56:58.933615 [2014-05-20 15:01:51.569631] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569660] W [rpc-clnt.c:1417:rpc_clnt_submit] 0-hpcscratch-client-0: failed to submit rpc-request (XID: 0x289033x Program: GlusterFS 3.1, ProgVers: 310, Proc: 20) to rpc-transport (hpcscratch-client-0) [2014-05-20 15:01:51.569677] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x7f193f7b0729] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f193f7afeee] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f193f7afe5e]))) 0-hpcscratch-client-0: forced unwinding frame type(GlusterFS 3.1) op(READDIRP(40)) called at 2014-05-20 14:56:58.933694 [2014-05-20 15:01:51.569707] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569688] E [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:51.569775] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x7f193f7b0729] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x7f193f7afeee] (-->/opt/glusterfs/3.2.6/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f193f7afe5e]))) 0-hpcscratch-client-0: forced unwinding frame type(GlusterFS 3.1) op(READDIRP(40)) called at 2014-05-20 14:56:58.933746 [2014-05-20 15:01:51.569795] E [client3_1-fops.c:1937:client3_1_readdirp_cbk] 0-hpcscratch-client-0: remote operation failed: Transport endpoint is not connected [2014-05-20 15:01:52.687996] I [glusterfsd.c:1493:main] 0-/opt/glusterfs/3.2.6/sbin/glusterfs: Started running /opt/glusterfs/3.2.6/sbin/glusterfs version 3.2.6 I have also noticed this in the volume log file: [2014-05-20 15:01:52.720687] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.50.173.147:1016) Does anyone have a clue on what could be causing this or any suggestions on fixing it? __________________________ Mike Jarsulic Sr. System Administrator Center for Research Informatics | University of Chicago Phone: (773) 702-2066 Email: mjarsulic at bsd.uchicago.edu<mailto:mjarsulic at bsd.uchicago.edu> ________________________________ This email is intended only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this email message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is prohibited. If you have received this email in error, please notify the sender and destroy/delete all copies of the transmittal. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140520/50d5dc95/attachment.html>