Liam Slusser
2010-Dec-04 00:25 UTC
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
Hey all,
I've run into a weird problem. I have a few client boxes that
occasionally crash due to a non-gluster related problem. But once the
box comes back up i cannot get the Gluster client to reconnect to the
bricks.
Centos 5 64bit and Gluster 2.0.9
df shows:
df: `/mnt/mymount': Transport endpoint is not connected
[root at client~]# netstat -pan|grep glus
tcp 0 1 10.8.10.107:1000 10.8.11.102:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:1001 10.8.11.102:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:998 10.8.11.102:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:996 10.8.11.102:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:1003 10.8.11.101:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:1002 10.8.11.101:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:997 10.8.11.101:6996
SYN_SENT 3385/glusterfs
tcp 0 1 10.8.10.107:999 10.8.11.101:6996
SYN_SENT 3385/glusterfs
from the gluster client log:
+------------------------------------------------------------------------------+
[2010-12-03 15:48:28] W [glusterfsd.c:526:_log_if_option_is_invalid]
readahead: option 'page-size' is not recognized
[2010-12-03 15:48:28] N [glusterfsd.c:1306:main] glusterfs: Successfully started
[2010-12-03 15:48:29] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 2: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:48:30] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 3: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:48:31] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 4: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:48:31] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 5: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:48:32] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 6: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1a:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1a:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2a:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2a:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1b:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick1b:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2b:
connection to failed (Connection timed out)
[2010-12-03 15:51:37] E [socket.c:745:socket_connect_finish] brick2b:
connection to failed (Connection timed out)
[2010-12-03 15:59:46] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 7: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:59:47] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 8: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:59:54] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 9: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 10: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 11: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:59:55] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 12: ERR => -1 (Transport endpoint is not connected)
[2010-12-03 15:59:56] W [fuse-bridge.c:1892:fuse_statfs_cbk]
glusterfs-fuse: 13: ERR => -1 (Transport endpoint is not connected)
However, the port is obviously open...
[root at client~]# telnet 10.8.11.102 6996
Trying 10.2.56.102...
Connected to glusterserverb (10.8.11.102).
Escape character is '^]'.
^]
telnet> close
Connection closed.
The gluster server log doesnt see ANY connection attempts from the
client however it DOES see my telnet tcp attempts. I'm using IP
addresses in all my configuration files - no names. I do have a
Juniper firewall between the two servers that is doing stateful
firewalling and i've set it up for the connections to never timeout -
and ive never had a problem once it finally connects. And i can
create a new connection with telnet but not the client...
Anybody seen anything like this before? Ideas?
thanks,
liam
Liam Slusser
2010-Dec-04 02:03 UTC
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
I thought the exact same thing...but like i said i can telnet to the host/port without any issue. And there is no other issues on the network that would indicate any not working correctly. And all the other clients on the same network/switch are working fine. Its only when a client crashes... liam On Fri, Dec 3, 2010 at 4:34 PM, <mki at mozone.net> wrote:>> I've run into a weird problem. ?I have a few client boxes that >> occasionally crash due to a non-gluster related problem. ?But once the >> box comes back up i cannot get the Gluster client to reconnect to the >> bricks. > > This almost seems like a networking/firewall issue... ?Do you have > any trunks setup between the switch that the client and/or server > are on and the router? ?Perhaps one of those trunk legs is down > causing random packets to get blackholed? > > Mohan >
mki-glusterfs at mozone.net
2010-Dec-04 02:32 UTC
[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
On Fri, Dec 03, 2010 at 04:25:18PM -0800, Liam Slusser wrote:> [root at client~]# netstat -pan|grep glus > tcp 0 1 10.8.10.107:1000 10.8.11.102:6996 SYN_SENT 3385/glusterfs > > from the gluster client log: > > However, the port is obviously open... > > [root at client~]# telnet 10.8.11.102 6996 > Trying 10.2.56.102... > Connected to glusterserverb (10.8.11.102). > Escape character is '^]'. > ^] > telnet> close > Connection closed.Looking further... why is your telnet trying 10.2.56.102 when you clearly specified 10.8.11.102? Also, what happens if you do a specific route for the 10.8.11.0/24 block thru the appropriate gw without relying on the default gw to route for you? In this way you dont end up in a situation where the client is mistakenly trying to go over the wrong interface. The telnet maybe switching to an alternate interface to see if it gets thru? Mohan