Alessandro De Salvo
2013-May-23 08:17 UTC
[Gluster-users] Crash in glusterfsd 3.4.0 beta1 and "Transport endpoint is not connected"
Hi, I have a replicated volume among two fedora 18 machines using glusterfs 3.4.0 beta1 from rawhide. All is fine with glusterd, and the replication is perfomed correctly, but every time I try to access any file from the fuse mounts I see this kind of errors in /var/log/glusterfs/<mountpoint>.log, leading to "Transport endpoint is not connected" so the filesystems get unmounted: [2013-05-23 08:06:24.302332] I [afr-common.c:3709:afr_notify] 0-adsroma1-gluster-data01-replicate-0: Subvolume 'adsroma1-gluster-data01-client-1' came back up; going online. [2013-05-23 08:06:24.302706] I [client-handshake.c:450:client_set_lk_version_cbk] 0-adsroma1-gluster-data01-client-1: Server lk version = 1 [2013-05-23 08:06:24.316318] I [client-handshake.c:1658:select_server_supported_programs] 0-adsroma1-gluster-data01-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-05-23 08:06:24.336718] I [client-handshake.c:1456:client_setvolume_cbk] 0-adsroma1-gluster-data01-client-0: Connected to 127.0.0.1:49157, attached to remote volume '/gluster/data01/files'. [2013-05-23 08:06:24.336732] I [client-handshake.c:1468:client_setvolume_cbk] 0-adsroma1-gluster-data01-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-05-23 08:06:24.344178] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-05-23 08:06:24.344372] I [client-handshake.c:450:client_set_lk_version_cbk] 0-adsroma1-gluster-data01-client-0: Server lk version = 1 [2013-05-23 08:06:24.344502] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-05-23 08:06:24.345008] I [afr-common.c:2059:afr_set_root_inode_on_first_lookup] 0-adsroma1-gluster-data01-replicate-0: added root inode [2013-05-23 08:06:24.345240] I [afr-common.c:2122:afr_discovery_cbk] 0-adsroma1-gluster-data01-replicate-0: selecting local read_child adsroma1-gluster-data01-client-0 pending frames: frame : type(1) op(READ) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-05-23 08:08:20configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0beta1 /usr/lib64/libc.so.6[0x3c51035b50] /usr/lib64/glusterfs/3.4.0beta1/xlator/performance/io-cache.so(ioc_open_cbk+0x8b)[0x7fb93cd2bc4b] /usr/lib64/glusterfs/3.4.0beta1/xlator/performance/read-ahead.so(ra_open_cbk+0x1c1)[0x7fb93cf3a951] /usr/lib64/glusterfs/3.4.0beta1/xlator/cluster/distribute.so(dht_open_cbk+0xe0)[0x7fb93d37f890] /usr/lib64/glusterfs/3.4.0beta1/xlator/cluster/replicate.so(afr_open_cbk+0x29c)[0x7fb93d5bf60c] /usr/lib64/glusterfs/3.4.0beta1/xlator/protocol/client.so(client3_3_open_cbk+0x174)[0x7fb93d82f5c4] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x3c5300e880] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x101)[0x3c5300ea81] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x3c5300b0d3] /usr/lib64/glusterfs/3.4.0beta1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7fb93eefa6a4] /usr/lib64/glusterfs/3.4.0beta1/rpc-transport/socket.so(socket_event_handler+0x11c)[0x7fb93eefa9dc] /usr/lib64/libglusterfs.so.0[0x3c5285923b] /usr/sbin/glusterfs(main+0x3a4)[0x4049d4] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x3c51021c35] /usr/sbin/glusterfs[0x404d49] --------- The volume is defined as follows: Volume Name: adsroma1-gluster-data01 Type: Replicate Volume ID: 1ca608c7-8a9d-4d8c-ac05-fabc2d2c2565 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: pc-ads-02.roma1.infn.it:/gluster/data01/files Brick2: pc-ads-03.roma1.infn.it:/gluster/data01/files Is it a known problem with this beta version? Any hint? Thanks, Alessandro
Daniel Müller
2013-May-23 09:07 UTC
[Gluster-users] Crash in glusterfsd 3.4.0 beta1 and "Transport endpoint is not connected"
I had the same problem with gluster 3.2 Syncing two Bricks. Look at your glusterfs-export.log, mout-glusterfs.log or something like this. For me the reason were some files who done that issue: --> [2013-04-25 12:36:19.127124] E [afr-self-heal-metadata.c:521:afr_sh_metadata_fix] 0-sambavol-replicate-0: Unable to self-heal permissions/ownership of '/windows/winuser/xxxxx/xxx/xxx/xxx 2013/xxx.xls' (possible split-brain). Please fix the file on all backend volumes After removing this files all was up and running agein. Good Luck Daniel ----------------------------------------------- EDV Daniel M?ller Leitung EDV Tropenklinik Paul-Lechler-Krankenhaus Paul-Lechler-Str. 24 72076 T?bingen Tel.: 07071/206-463, Fax: 07071/206-499 eMail: mueller at tropenklinik.de Internet: www.tropenklinik.de ----------------------------------------------- -----Urspr?ngliche Nachricht----- Von: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] Im Auftrag von Alessandro De Salvo Gesendet: Donnerstag, 23. Mai 2013 10:18 An: gluster-users at gluster.org Betreff: [Gluster-users] Crash in glusterfsd 3.4.0 beta1 and "Transport endpoint is not connected" Hi, I have a replicated volume among two fedora 18 machines using glusterfs 3.4.0 beta1 from rawhide. All is fine with glusterd, and the replication is perfomed correctly, but every time I try to access any file from the fuse mounts I see this kind of errors in /var/log/glusterfs/<mountpoint>.log, leading to "Transport endpoint is not connected" so the filesystems get unmounted: [2013-05-23 08:06:24.302332] I [afr-common.c:3709:afr_notify] 0-adsroma1-gluster-data01-replicate-0: Subvolume 'adsroma1-gluster-data01-client-1' came back up; going online. [2013-05-23 08:06:24.302706] I [client-handshake.c:450:client_set_lk_version_cbk] 0-adsroma1-gluster-data01-client-1: Server lk version = 1 [2013-05-23 08:06:24.316318] I [client-handshake.c:1658:select_server_supported_programs] 0-adsroma1-gluster-data01-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-05-23 08:06:24.336718] I [client-handshake.c:1456:client_setvolume_cbk] 0-adsroma1-gluster-data01-client-0: Connected to 127.0.0.1:49157, attached to remote volume '/gluster/data01/files'. [2013-05-23 08:06:24.336732] I [client-handshake.c:1468:client_setvolume_cbk] 0-adsroma1-gluster-data01-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-05-23 08:06:24.344178] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-05-23 08:06:24.344372] I [client-handshake.c:450:client_set_lk_version_cbk] 0-adsroma1-gluster-data01-client-0: Server lk version = 1 [2013-05-23 08:06:24.344502] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-05-23 08:06:24.345008] I [afr-common.c:2059:afr_set_root_inode_on_first_lookup] 0-adsroma1-gluster-data01-replicate-0: added root inode [2013-05-23 08:06:24.345240] I [afr-common.c:2122:afr_discovery_cbk] 0-adsroma1-gluster-data01-replicate-0: selecting local read_child adsroma1-gluster-data01-client-0 pending frames: frame : type(1) op(READ) frame : type(1) op(OPEN) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-05-23 08:08:20configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0beta1 /usr/lib64/libc.so.6[0x3c51035b50] /usr/lib64/glusterfs/3.4.0beta1/xlator/performance/io-cache.so(ioc_open_cbk+ 0x8b)[0x7fb93cd2bc4b] /usr/lib64/glusterfs/3.4.0beta1/xlator/performance/read-ahead.so(ra_open_cbk +0x1c1)[0x7fb93cf3a951] /usr/lib64/glusterfs/3.4.0beta1/xlator/cluster/distribute.so(dht_open_cbk+0x e0)[0x7fb93d37f890] /usr/lib64/glusterfs/3.4.0beta1/xlator/cluster/replicate.so(afr_open_cbk+0x2 9c)[0x7fb93d5bf60c] /usr/lib64/glusterfs/3.4.0beta1/xlator/protocol/client.so(client3_3_open_cbk +0x174)[0x7fb93d82f5c4] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x3c5300e880] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x101)[0x3c5300ea81] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x3c5300b0d3] /usr/lib64/glusterfs/3.4.0beta1/rpc-transport/socket.so(socket_event_poll_in +0x34)[0x7fb93eefa6a4] /usr/lib64/glusterfs/3.4.0beta1/rpc-transport/socket.so(socket_event_handler +0x11c)[0x7fb93eefa9dc] /usr/lib64/libglusterfs.so.0[0x3c5285923b] /usr/sbin/glusterfs(main+0x3a4)[0x4049d4] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x3c51021c35] /usr/sbin/glusterfs[0x404d49] --------- The volume is defined as follows: Volume Name: adsroma1-gluster-data01 Type: Replicate Volume ID: 1ca608c7-8a9d-4d8c-ac05-fabc2d2c2565 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: pc-ads-02.roma1.infn.it:/gluster/data01/files Brick2: pc-ads-03.roma1.infn.it:/gluster/data01/files Is it a known problem with this beta version? Any hint? Thanks, Alessandro _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users