Liam Slusser
2010-Dec-04 00:25 UTC
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
Hey all, I've run into a weird problem. I have a few client boxes that occasionally crash due to a non-gluster related problem. But once the box comes back up i cannot get the Gluster client to reconnect to the bricks. Centos 5 64bit and Gluster 2.0.9 df shows: df: `/mnt/mymount': Transport endpoint is not connected [root at client~]# netstat -pan|grep glus tcp 0 1 10.8.10.107:1000 10.8.11.102:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:1001 10.8.11.102:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:998 10.8.11.102:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:996 10.8.11.102:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:1003 10.8.11.101:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:1002 10.8.11.101:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:997 10.8.11.101:6996 SYN_SENT 3385/glusterfs tcp 0 1 10.8.10.107:999 10.8.11.101:6996 SYN_SENT 3385/glusterfs from the gluster client log: +------------------------------------------------------------------------------+ [2010-12-03 15:48:28] W [glusterfsd.c:526:_log_if_option_is_invalid] readahead: option 'page-size' is not recognized [2010-12-03 15:48:28] N [glusterfsd.c:1306:main] glusterfs: Successfully started [2010-12-03 15:48:29] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 2: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:48:30] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 3: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:48:31] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 4: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:48:31] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 5: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:48:32] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 6: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2a: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1b: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1b: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2b: connection to failed (Connection timed out) [2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2b: connection to failed (Connection timed out) [2010-12-03 15:59:46] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 7: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:59:47] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 8: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:59:54] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 9: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 10: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 11: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 12: ERR => -1 (Transport endpoint is not connected) [2010-12-03 15:59:56] W [fuse-bridge.c:1892:fuse_statfs_cbk] glusterfs-fuse: 13: ERR => -1 (Transport endpoint is not connected) However, the port is obviously open... [root at client~]# telnet 10.8.11.102 6996 Trying 10.2.56.102... Connected to glusterserverb (10.8.11.102). Escape character is '^]'. ^] telnet> close Connection closed. The gluster server log doesnt see ANY connection attempts from the client however it DOES see my telnet tcp attempts. I'm using IP addresses in all my configuration files - no names. I do have a Juniper firewall between the two servers that is doing stateful firewalling and i've set it up for the connections to never timeout - and ive never had a problem once it finally connects. And i can create a new connection with telnet but not the client... Anybody seen anything like this before? Ideas? thanks, liam
Liam Slusser
2010-Dec-04 02:03 UTC
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
I thought the exact same thing...but like i said i can telnet to the host/port without any issue. And there is no other issues on the network that would indicate any not working correctly. And all the other clients on the same network/switch are working fine. Its only when a client crashes... liam On Fri, Dec 3, 2010 at 4:34 PM, <mki at mozone.net> wrote:>> I've run into a weird problem. ?I have a few client boxes that >> occasionally crash due to a non-gluster related problem. ?But once the >> box comes back up i cannot get the Gluster client to reconnect to the >> bricks. > > This almost seems like a networking/firewall issue... ?Do you have > any trunks setup between the switch that the client and/or server > are on and the router? ?Perhaps one of those trunk legs is down > causing random packets to get blackholed? > > Mohan >
mki-glusterfs at mozone.net
2010-Dec-04 02:32 UTC
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
On Fri, Dec 03, 2010 at 04:25:18PM -0800, Liam Slusser wrote:> [root at client~]# netstat -pan|grep glus > tcp 0 1 10.8.10.107:1000 10.8.11.102:6996 SYN_SENT 3385/glusterfs > > from the gluster client log: > > However, the port is obviously open... > > [root at client~]# telnet 10.8.11.102 6996 > Trying 10.2.56.102... > Connected to glusterserverb (10.8.11.102). > Escape character is '^]'. > ^] > telnet> close > Connection closed.Looking further... why is your telnet trying 10.2.56.102 when you clearly specified 10.8.11.102? Also, what happens if you do a specific route for the 10.8.11.0/24 block thru the appropriate gw without relying on the default gw to route for you? In this way you dont end up in a situation where the client is mistakenly trying to go over the wrong interface. The telnet maybe switching to an alternate interface to see if it gets thru? Mohan