Hi, i have im having issues with gluster nfs, it keep crashing after few hours under medium load. OS: CentOS 7.2 Gluster version 3.7.13 Gluster info; Volume Name: vlm01 Type: Distributed-Replicate Volume ID: eacd8248-dca3-4530-9aed-7714a5a114f2 Status: Started Number of Bricks: 7 x 3 = 21 Transport-type: tcp Bricks: Brick1: gfs01:/bricks/b01/vlm01 Brick2: gfs02:/bricks/b01/vlm01 Brick3: gfs03:/bricks/b01/vlm01 Brick4: gfs01:/bricks/b02/vlm01 Brick5: gfs02:/bricks/b02/vlm01 Brick6: gfs03:/bricks/b02/vlm01 Brick7: gfs01:/bricks/b03/vlm01 Brick8: gfs02:/bricks/b03/vlm01 Brick9: gfs03:/bricks/b03/vlm01 Brick10: gfs01:/bricks/b04/vlm01 Brick11: gfs02:/bricks/b04/vlm01 Brick12: gfs03:/bricks/b04/vlm01 Brick13: gfs01:/bricks/b05/vlm01 Brick14: gfs02:/bricks/b05/vlm01 Brick15: gfs03:/bricks/b05/vlm01 Brick16: gfs01:/bricks/b06/vlm01 Brick17: gfs02:/bricks/b06/vlm01 Brick18: gfs03:/bricks/b06/vlm01 Brick19: gfs01:/bricks/b07/vlm01 Brick20: gfs02:/bricks/b07/vlm01 Brick21: gfs03:/bricks/b07/vlm01 Options Reconfigured: auth.allow: 192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56 features.shard: on features.shard-block-size: 16MB cluster.self-heal-window-size: 128 cluster.data-self-heal-algorithm: full performance.write-behind: off performance.strict-write-ordering: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: on network.ping-timeout: 10 ##### Gluster status: Status of volume: vlm01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gfs01:/bricks/b01/vlm01 49159 0 Y 3050 Brick gfs02:/bricks/b01/vlm01 49158 0 Y 3012 Brick gfs03:/bricks/b01/vlm01 49158 0 Y 3889 Brick gfs01:/bricks/b02/vlm01 49160 0 Y 3058 Brick gfs02:/bricks/b02/vlm01 49159 0 Y 3011 Brick gfs03:/bricks/b02/vlm01 49159 0 Y 3888 Brick gfs01:/bricks/b03/vlm01 49161 0 Y 3067 Brick gfs02:/bricks/b03/vlm01 49160 0 Y 3024 Brick gfs03:/bricks/b03/vlm01 49160 0 Y 3899 Brick gfs01:/bricks/b04/vlm01 49162 0 Y 3057 Brick gfs02:/bricks/b04/vlm01 49161 0 Y 3035 Brick gfs03:/bricks/b04/vlm01 49161 0 Y 3898 Brick gfs01:/bricks/b05/vlm01 49163 0 Y 3075 Brick gfs02:/bricks/b05/vlm01 49162 0 Y 3030 Brick gfs03:/bricks/b05/vlm01 49162 0 Y 3914 Brick gfs01:/bricks/b06/vlm01 49164 0 Y 3091 Brick gfs02:/bricks/b06/vlm01 49163 0 Y 3048 Brick gfs03:/bricks/b06/vlm01 49163 0 Y 3913 Brick gfs01:/bricks/b07/vlm01 49165 0 Y 3080 Brick gfs02:/bricks/b07/vlm01 49164 0 Y 3042 Brick gfs03:/bricks/b07/vlm01 49164 0 Y 3908 NFS Server on localhost 2049 0 Y 28926 Self-heal Daemon on localhost N/A N/A Y 28934 NFS Server on gfs02 2049 0 Y 9944 Self-heal Daemon on gfs02 N/A N/A Y 9953 NFS Server on gfs01 2049 0 Y 46993 Self-heal Daemon on gfs01 N/A N/A Y 47003 Task Status of Volume vlm01 ------------------------------------------------------------------------------ There are no active volume tasks ##### dmesg; Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0) Jul 23 09:53:07 gfs03 nfs[31243]: patchset: git:// git.gluster.com/glusterfs.git Jul 23 09:53:07 gfs03 nfs[31243]: signal received: 11 Jul 23 09:53:07 gfs03 nfs[31243]: time of crash: Jul 23 09:53:07 gfs03 nfs[31243]: 2016-07-23 06:53:07 Jul 23 09:53:07 gfs03 nfs[31243]: configuration details: Jul 23 09:53:07 gfs03 nfs[31243]: argp 1 Jul 23 09:53:07 gfs03 nfs[31243]: backtrace 1 Jul 23 09:53:07 gfs03 nfs[31243]: dlfcn 1 Jul 23 09:53:07 gfs03 nfs[31243]: libpthread 1 Jul 23 09:53:07 gfs03 nfs[31243]: llistxattr 1 Jul 23 09:53:07 gfs03 nfs[31243]: setfsid 1 Jul 23 09:53:07 gfs03 nfs[31243]: spinlock 1 Jul 23 09:53:07 gfs03 nfs[31243]: epoll.h 1 Jul 23 09:53:07 gfs03 nfs[31243]: xattr.h 1 Jul 23 09:53:07 gfs03 nfs[31243]: st_atim.tv_nsec 1 Jul 23 09:53:07 gfs03 nfs[31243]: package-string: glusterfs 3.7.13 ##### nfs.log; [2016-07-23 05:59:19.961654] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-18: Connected to vlm01-client-18, attached to remote volume '/bricks/b07/vlm01'. [2016-07-23 05:59:19.961670] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-18: Server and Client lk-version numbers are not same, reopening the fds [2016-07-23 05:59:19.961717] I [MSGID: 108005] [afr-common.c:4142:afr_notify] 0-vlm01-replicate-6: Subvolume 'vlm01-client-18' came back up; going online. [2016-07-23 05:59:19.961854] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-18: Server lk version = 1 [2016-07-23 05:59:19.962027] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-vlm01-client-20: changing port to 49164 (from 0) [2016-07-23 05:59:19.964637] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vlm01-client-19: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-23 05:59:19.965956] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-19: Connected to vlm01-client-19, attached to remote volume '/bricks/b07/vlm01'. [2016-07-23 05:59:19.965989] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-19: Server and Client lk-version numbers are not same, reopening the fds [2016-07-23 05:59:19.966140] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-19: Server lk version = 1 [2016-07-23 05:59:19.967605] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vlm01-client-20: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-23 05:59:19.967919] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-20: Connected to vlm01-client-20, attached to remote volume '/bricks/b07/vlm01'. [2016-07-23 05:59:19.967943] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-20: Server and Client lk-version numbers are not same, reopening the fds [2016-07-23 05:59:19.968107] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-20: Server lk version = 1 [2016-07-23 05:59:19.973053] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-17: Connected to vlm01-client-17, attached to remote volume '/bricks/b06/vlm01'. [2016-07-23 05:59:19.973081] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-17: Server and Client lk-version numbers are not same, reopening the fds [2016-07-23 05:59:19.973582] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-17: Server lk version = 1 [2016-07-23 05:59:19.974557] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-0: selecting local read_child vlm01-client-2 [2016-07-23 05:59:19.976100] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-1: selecting local read_child vlm01-client-5 [2016-07-23 05:59:19.976161] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-2: selecting local read_child vlm01-client-8 [2016-07-23 05:59:19.976583] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-3: selecting local read_child vlm01-client-11 [2016-07-23 05:59:19.976640] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-4: selecting local read_child vlm01-client-14 [2016-07-23 05:59:19.976676] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-5: selecting local read_child vlm01-client-17 [2016-07-23 05:59:19.976879] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-6: selecting local read_child vlm01-client-20 [2016-07-23 05:59:36.360646] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0) [2016-07-23 05:59:36.962314] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk~ (hash=vlm01-replicate-6/cache=vlm01-replicate-6) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk (hash=vlm01-replicate-3/cache=vlm01-replicate-6) [2016-07-23 05:59:37.019564] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0) The message "I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)" repeated 2 times between [2016-07-23 05:59:37.019564] and [2016-07-23 05:59:37.421227] [2016-07-23 05:59:38.757822] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0) [2016-07-23 05:59:39.950960] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk~ (hash=vlm01-replicate-5/cache=vlm01-replicate-5) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk (hash=vlm01-replicate-5/cache=vlm01-replicate-5) [2016-07-23 06:00:03.048266] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk~ (hash=vlm01-replicate-2/cache=vlm01-replicate-2) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk (hash=vlm01-replicate-5/cache=vlm01-replicate-2) [2016-07-23 06:00:07.994953] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 4/PRTG 4.vmx => (XID: 8439cb9c, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [2016-07-23 06:01:02.831132] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: ( 192.168.208.85:676) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364 [2016-07-23 06:16:48.221237] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-12: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists] [2016-07-23 06:16:48.221231] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-13: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists] [2016-07-23 06:16:48.221382] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-14: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists] [2016-07-23 06:16:48.221878] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 => (XID: 8441a50a, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [2016-07-23 06:17:11.343148] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists] [2016-07-23 06:17:11.343170] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists] [2016-07-23 06:17:11.343234] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists] [2016-07-23 06:17:11.343596] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 => (XID: 51e43a2f, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [Invalid argument] [2016-07-23 06:17:21.393996] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: ( 192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364 [2016-07-23 06:50:11.441462] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists] [2016-07-23 06:50:11.441471] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists] [2016-07-23 06:50:11.441530] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists] [2016-07-23 06:50:11.441959] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 => (XID: 51ea9a6e, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [2016-07-23 06:50:21.712570] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: ( 192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364 frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2016-07-23 06:53:07 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.13 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f74cbde32f2] /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f74cbe08aad] /lib64/libc.so.6(+0x35670)[0x7f74ca4cf670] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7f74cac50210] I appreciate your help guys. Respectfully *Mahdi A. Mahdi* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160723/1f7d9ee6/attachment.html>