I upgrade my systems to 3.6.3 and some of my clients are now having issues connecting. I can mount using NFS without any issues. However, when I try to FUSE mount, it times out on many of my nodes. It mounted to approximately 400-nodes. However, the remainder timed out. Any suggestions for how to fix? On the client side, I am getting the following in the logs: [2015-05-05 00:17:18.013319] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.3 (args: /usr/sbin/glusterfs --volfile-server=gfsib01a.corvidtec.com --volfile-server-transport=tcp --volfile-id=/homegfs.tcp /homegfs_test) [2015-05-05 00:18:21.019012] E [socket.c:2276:socket_connect_finish] 0-glusterfs: connection to 10.1.70.1:24007 failed (Connection timed out) [2015-05-05 00:18:21.019092] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: gfsib01a.corvidtec .com (Transport endpoint is not connected) [2015-05-05 00:18:21.019100] I [glusterfsd-mgmt.c:1817:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2015-05-05 00:18:21.019224] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (1), shutting down [2015-05-05 00:18:21.019239] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/homegfs_test'. [2015-05-05 00:18:21.027770] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down Logs from my server are attached... [root at gfs01a log]# gluster volume status homegfs Status of volume: homegfs Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3816 Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3826 Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3821 Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3831 Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3959 Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3970 Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3964 Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3975 NFS Server on localhost 2049 Y 3830 Self-heal Daemon on localhost N/A Y 3835 NFS Server on gfsib01b.corvidtec.com 2049 Y 3840 Self-heal Daemon on gfsib01b.corvidtec.com N/A Y 3845 NFS Server on gfsib02b.corvidtec.com 2049 Y 3984 Self-heal Daemon on gfsib02b.corvidtec.com N/A Y 3989 NFS Server on gfsib02a.corvidtec.com 2049 Y 3973 Self-heal Daemon on gfsib02a.corvidtec.com N/A Y 3978 Task Status of Volume homegfs ------------------------------------------------------------------------------ Task : Rebalance ID : 58b6cc76-c29c-4695-93fe-c42b1112e171 Status : completed [root at gfs01a log]# gluster volume info homegfs Volume Name: homegfs Type: Distributed-Replicate Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs Options Reconfigured: server.manage-gids: on changelog.rollover-time: 15 changelog.fsync-interval: 3 changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: off storage.owner-gid: 100 network.ping-timeout: 10 server.allow-insecure: on performance.write-behind-window-size: 128MB performance.cache-size: 128MB performance.io-thread-count: 32 David ======================= David F. Robinson, Ph.D. President - Corvid Technologies 145 Overhill Drive Mooresville, NC 28117 704.799.6944 x101 [Office] 704.252.1310 [Cell] 704.799.7974 [Fax] david.robinson at corvidtec.com http://www.corvidtec.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150505/7b8d5295/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: glusterfs.tgz Type: application/x-compressed Size: 1395639 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150505/7b8d5295/attachment-0001.bin>
It looks like my issue was due to a change in the way name resolution is now handled in 3.6.3. I'll send in an explanation tomorrow in case anyone else is having a similar issue. David ------ Original Message ------ From: "David Robinson" <drobinson at corvidtec.com> To: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster Devel" <gluster-devel at gluster.org> Sent: 5/4/2015 8:23:28 PM Subject: 3.6.3 + fuse mount>I upgrade my systems to 3.6.3 and some of my clients are now having >issues connecting. I can mount using NFS without any issues. However, >when I try to FUSE mount, it times out on many of my nodes. It mounted >to approximately 400-nodes. However, the remainder timed out. Any >suggestions for how to fix? > >On the client side, I am getting the following in the logs: > >[2015-05-05 00:17:18.013319] I [MSGID: 100030] [glusterfsd.c:2018:main] >0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version >3.6.3 >(args: /usr/sbin/glusterfs --volfile-server=gfsib01a.corvidtec.com >--volfile-server-transport=tcp --volfile-id=/homegfs.tcp /homegfs_test) >[2015-05-05 00:18:21.019012] E [socket.c:2276:socket_connect_finish] >0-glusterfs: connection to 10.1.70.1:24007 failed (Connection timed >out) >[2015-05-05 00:18:21.019092] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] >0-glusterfsd-mgmt: failed to connect with remote-host: >gfsib01a.corvidtec >.com (Transport endpoint is not connected) >[2015-05-05 00:18:21.019100] I [glusterfsd-mgmt.c:1817:mgmt_rpc_notify] >0-glusterfsd-mgmt: Exhausted all volfile servers >[2015-05-05 00:18:21.019224] W [glusterfsd.c:1194:cleanup_and_exit] >(--> 0-: received signum (1), shutting down >[2015-05-05 00:18:21.019239] I [fuse-bridge.c:5599:fini] 0-fuse: >Unmounting '/homegfs_test'. >[2015-05-05 00:18:21.027770] W [glusterfsd.c:1194:cleanup_and_exit] >(--> 0-: received signum (15), shutting down >Logs from my server are attached... > >[root at gfs01a log]# gluster volume status homegfs >Status of volume: homegfs >Gluster process Port Online >Pid >------------------------------------------------------------------------------ >Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs 49152 Y >3816 >Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs 49152 Y >3826 >Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs 49153 Y >3821 >Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs 49153 Y >3831 >Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs 49152 Y >3959 >Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs 49152 Y >3970 >Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs 49153 Y >3964 >Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs 49153 Y >3975 >NFS Server on localhost 2049 Y >3830 >Self-heal Daemon on localhost N/A Y >3835 >NFS Server on gfsib01b.corvidtec.com 2049 Y >3840 >Self-heal Daemon on gfsib01b.corvidtec.com N/A Y >3845 >NFS Server on gfsib02b.corvidtec.com 2049 Y >3984 >Self-heal Daemon on gfsib02b.corvidtec.com N/A Y >3989 >NFS Server on gfsib02a.corvidtec.com 2049 Y >3973 >Self-heal Daemon on gfsib02a.corvidtec.com N/A Y >3978 > >Task Status of Volume homegfs >------------------------------------------------------------------------------ >Task : Rebalance >ID : 58b6cc76-c29c-4695-93fe-c42b1112e171 >Status : completed > > >[root at gfs01a log]# gluster volume info homegfs > >Volume Name: homegfs >Type: Distributed-Replicate >Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >Status: Started >Number of Bricks: 4 x 2 = 8 >Transport-type: tcp >Bricks: >Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs >Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs >Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs >Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs >Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs >Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs >Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs >Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs >Options Reconfigured: >server.manage-gids: on >changelog.rollover-time: 15 >changelog.fsync-interval: 3 >changelog.changelog: on >geo-replication.ignore-pid-check: on >geo-replication.indexing: off >storage.owner-gid: 100 >network.ping-timeout: 10 >server.allow-insecure: on >performance.write-behind-window-size: 128MB >performance.cache-size: 128MB >performance.io-thread-count: 32 > >David > > > >=======================> > > >David F. Robinson, Ph.D. > >President - Corvid Technologies > >145 Overhill Drive > >Mooresville, NC 28117 > >704.799.6944 x101 [Office] > >704.252.1310 [Cell] > >704.799.7974 [Fax] > >dmailto:David.Robinson at corvidtec.com > >http://www.corvidtec.com > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150505/fc003550/attachment.html>