lierihanmei
2013-Nov-06 16:33 UTC
[Gluster-users] df hang while a brick server down (rdma transport)
Hi all, We are using glusterfs on a cluster of some servers, connecting with Infiniband. While using rdma, if one of these servers is down, "mount" is fine but commands such as "df" with hang. This is the steps to reproduce. - create a volume of 3 bricks with rdma transport, each brick on a different server - start the volume - down a brick server - after mount the volume, "df -h" will hang We have tested on glusterfs3.2.5&3.2.7, all have this problem. Thanks for any help. Let me know if there's anything else I can provide! Here is a piece of glusterfs log: ...... [2013-10-22 11:07:41.335497] I [client-handshake.c:1090:select_server_supported_programs] 0-hash-01-client-0: Using Program GlusterFS 3.0.0, Num (1298437), Version (310) [2013-10-22 11:07:41.335618] I [client-handshake.c:1090:select_server_supported_programs] 0-hash-01-client-2: Using Program GlusterFS 3.0.0, Num (1298437), Version (310) [2013-10-22 11:07:41.335995] I [client-handshake.c:913:client_setvolume_cbk] 0-hash-01-client-0: Connected to 192.168.20.107:24013, attached to remote volume '/data/brick1'. [2013-10-22 11:07:41.336119] I [client-handshake.c:913:client_setvolume_cbk] 0-hash-01-client-2: Connected to 192.168.20.108:24014, attached to remote volume '/data/brick1'. [2013-10-22 11:07:41.591835] E [rdma.c:4417:tcp_connect_finish] 0-hash-01-client-1: tcp connect to failed (No route to host) [2013-10-22 11:07:41.591950] E [rdma.c:4417:tcp_connect_finish] 0-hash-01-client-1: tcp connect to failed (No route to host) [2013-10-22 11:07:41.591996] E [rdma.c:4417:tcp_connect_finish] 0-hash-01-client-1: tcp connect to failed (No route to host) [2013-10-22 11:07:44.592810] E [rdma.c:4417:tcp_connect_finish] 0-hash-01-client-1: tcp connect to failed (No route to host) [2013-10-22 11:07:44.592917] E [rdma.c:4417:tcp_connect_finish] 0-hash-01-client-1: tcp connect to failed (No route to host) [2013-10-22 11:07:44.592963] E [rdma.c:4417:tcp_connect_finish] 0-hash-01-client-1: tcp connect to failed (No route to host) (loop) ...... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131107/cbfa770e/attachment.html>