RafaĆ Radecki
2017-May-16 05:33 UTC
[Gluster-users] 3.9.1 in docker: problems when one of peers is unavailable.
Hi All. I have a 9 node dockerized glusterfs cluster and I am seeing a situation that: 1) docker daemon on 8th node failes and as a result glusterd on this node is leaving the cluster 2) as a result on 1st node I see message about 8th node being unavailable: [2017-05-15 12:48:22.142865] I [MSGID: 106004] [glusterd-handler.c:5808:__glusterd_peer_rpc_notify] 0-management: Peer <10.10.10.8> (<5cb55b7a-1e04-4fb8-bd1d-55ee647719d2>), in state <Peer in Cluster>, has disconnected from glusterd. [2017-05-15 12:48:22.167746] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x2035a) [0x7f7d9d62535a] -->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x29f48) [0x7f7d9d62ef48] -->/usr/lib64/glus terfs/3.9.1/xlator/mgmt/glusterd.so(+0xd50aa) [0x7f7d9d6da0aa] ) 0-management: Lock for vol csv not held [2017-05-15 12:48:22.167767] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not released for csv and the gluster share is unavailable and when I try to list it I get: Transport endpoint is not connected 3) then on 5th node I see message similar to 2) about 1st node being unavailable and 5th also disconnects from the cluster [2017-05-15 12:52:54.321189] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x2035a) [0x7f7fda22335a] -->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x29f48) [0x7f7fda22cf48] -->/usr/lib64/glus terfs/3.9.1/xlator/mgmt/glusterd.so(+0xd50aa) [0x7f7fda2d80aa] ) 0-management: Lock for vol csv not held [2017-05-15 12:52:54.321200] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not released for csv [2017-05-15 12:53:04.659418] E [socket.c:2307:socket_connect_finish] 0-management: connection to 10.10.10.:24007 failed (Connection refused) I am quite new to gluster but as far as I see this is somewhat a chain in which failure of 1st node leads to disconnect of two other nodes. Any hints how to solve this? Are there any settings for retries/timeouts/reconnects in gluster which could help in my case? Thanks for all help! BR, Rafal. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170516/219127e6/attachment.html>
Pranith Kumar Karampuri
2017-May-18 04:10 UTC
[Gluster-users] 3.9.1 in docker: problems when one of peers is unavailable.
Hey, 3.9.1 reached its EndOfLife, you can use either 3.8.x or 3.10.x. which are active at the moment. On Tue, May 16, 2017 at 11:03 AM, Rafa? Radecki <radecki.rafal at gmail.com> wrote:> Hi All. > > I have a 9 node dockerized glusterfs cluster and I am seeing a situation > that: > 1) docker daemon on 8th node failes and as a result glusterd on this node > is leaving the cluster > 2) as a result on 1st node I see message about 8th node being unavailable: > > [2017-05-15 12:48:22.142865] I [MSGID: 106004] [glusterd-handler.c:5808:__glusterd_peer_rpc_notify] > 0-management: Peer <10.10.10.8> (<5cb55b7a-1e04-4fb8-bd1d-55ee647719d2>), > in state <Peer in Cluster>, has disconnected from glusterd. > [2017-05-15 12:48:22.167746] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] > (-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x2035a) > [0x7f7d9d62535a] -->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x29f48) > [0x7f7d9d62ef48] -->/usr/lib64/glus > terfs/3.9.1/xlator/mgmt/glusterd.so(+0xd50aa) [0x7f7d9d6da0aa] ) > 0-management: Lock for vol csv not held > [2017-05-15 12:48:22.167767] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] > 0-management: Lock not released for csv > > > and the gluster share is unavailable and when I try to list it I get: > > Transport endpoint is not connected > 3) then on 5th node I see message similar to 2) about 1st node being > unavailable and 5th also disconnects from the cluster > > [2017-05-15 12:52:54.321189] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] > (-->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x2035a) > [0x7f7fda22335a] -->/usr/lib64/glusterfs/3.9.1/xlator/mgmt/glusterd.so(+0x29f48) > [0x7f7fda22cf48] -->/usr/lib64/glus > terfs/3.9.1/xlator/mgmt/glusterd.so(+0xd50aa) [0x7f7fda2d80aa] ) > 0-management: Lock for vol csv not held > > [2017-05-15 12:52:54.321200] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] > 0-management: Lock not released for csv > > [2017-05-15 12:53:04.659418] E [socket.c:2307:socket_connect_finish] > 0-management: connection to 10.10.10.:24007 failed (Connection refused) > > > I am quite new to gluster but as far as I see this is somewhat a chain in > which failure of 1st node leads to disconnect of two other nodes. Any hints > how to solve this? Are there any settings for retries/timeouts/reconnects > in gluster which could help in my case? > > Thanks for all help! > > BR, > Rafal. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170518/f796f405/attachment.html>