Logs from newly added node helped me in RCA of the issue. Info file on node 10.5.6.17 consist of an additional property "tier-enabled" which is not present in info file from other 3 nodes, hence when gluster peer probe call is made, in order to maintain consistency across the cluster cksum is compared. In this case as both files are different leading to different cksum, causing state in "State: Peer Rejected (Connected)". This inconsistency arise due to upgrade you did. Workaround: 1.Go to node 10.5.6.17 2.Open info file from "/var/lib/glusterd/vols/<vol-name>/info" and remove "tier-enabled=0". 3.Restart glusterd services 4.Peer probe again. Thanks Gaurav On Thu, Aug 31, 2017 at 3:37 PM, lejeczek <peljasz at yahoo.co.uk> wrote:> attached the lot as per your request. > > Would bee really great if you can find the root cause of this and suggest > a resolution. Fingers crossed. > thanks, L. > > On 31/08/17 05:34, Gaurav Yadav wrote: > >> Could you please sendentire content of "/var/lib/glusterd/" directory of >> the 4th node which is being peer probed, along with command-history and >> glusterd.logs. >> >> Thanks >> Gaurav >> >> On Wed, Aug 30, 2017 at 7:10 PM, lejeczek <peljasz at yahoo.co.uk <mailto: >> peljasz at yahoo.co.uk>> wrote: >> >> >> >> On 30/08/17 07:18, Gaurav Yadav wrote: >> >> >> Could you please send me "info" file which is >> placed in "/var/lib/glusterd/vols/<vol-name>" >> directory from all the nodes along with >> glusterd.logs and command-history. >> >> Thanks >> Gaurav >> >> On Tue, Aug 29, 2017 at 7:13 PM, lejeczek >> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk> >> <mailto:peljasz at yahoo.co.uk >> >> <mailto:peljasz at yahoo.co.uk>>> wrote: >> >> hi fellas, >> same old same >> in log of the probing peer I see: >> ... >> 2017-08-29 13:36:16.882196] I [MSGID: 106493] >> >> [glusterd-handler.c:3020:__glusterd_handle_probe_query] >> 0-glusterd: Responded to priv.xx.xx.priv.xx.xx.x, >> op_ret: 0, op_errno: 0, ret: 0 >> [2017-08-29 13:36:16.904961] I [MSGID: 106490] >> >> [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] >> 0-glusterd: Received probe from uuid: >> 2a17edb4-ae68-4b67-916e-e38a2087ca28 >> [2017-08-29 13:36:16.906477] E [MSGID: 106010] >> >> [glusterd-utils.c:3034:glusterd_compare_friend_volume] >> 0-management: Version of Cksums CO-DATA >> differ. local >> cksum = 4088157353, remote cksum = 2870780063 >> on peer >> 10.5.6.17 >> [2017-08-29 13:36:16.907187] I [MSGID: 106493] >> >> [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] >> 0-glusterd: Responded to 10.5.6.17 (0), ret: >> 0, op_ret: -1 >> ... >> >> Why would adding a new peer make cluster jump >> to check >> checksums on a vol on that newly added peer? >> >> >> really. I mean, no brick even exists on newly added >> peer, it's just been probed, why this?: >> >> [2017-08-30 13:17:51.949430] E [MSGID: 106010] >> [glusterd-utils.c:3034:glusterd_compare_friend_volume] >> 0-management: Version of Cksums CO-DATA differ. local >> cksum = 4088157353, remote cksum = 2870780063 on peer >> 10.5.6.17 >> >> 10.5.6.17 is a candidate I'm probing from a working >> cluster. >> Why gluster wants checksums and why checksums would be >> different? >> Would anybody know what is going on there? >> >> >> Is it why the peer gets rejected? >> That peer I'm hoping to add, was a member of the >> cluster in the past but I did "usual" wipe of >> /var/lib/gluster on candidate peer. >> >> a hint, solution would be great to hear. >> L. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org> >> <mailto:Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org>> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> >> >> <http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users>> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170901/885c47f7/attachment.html>
hi, still tricky whether I do or do not remove "tier-enabled=0" on rejected peer, and try to restart glusterd service there, restart fails: lusterd version 3.10.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2017-09-01 07:41:08.251314] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536 [2017-09-01 07:41:08.251400] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory [2017-09-01 07:41:08.275000] W [MSGID: 103071] [rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2017-09-01 07:41:08.275071] W [MSGID: 103055] [rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device [2017-09-01 07:41:08.275096] W [rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2017-09-01 07:41:08.275307] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2017-09-01 07:41:08.275343] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2017-09-01 07:41:13.941020] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30712 [2017-09-01 07:41:14.109192] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2017-09-01 07:41:14.109364] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2017-09-01 07:41:14.109481] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-09-01 07:41:14.134691] E [MSGID: 106187] [glusterd-store.c:4559:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2017-09-01 07:41:14.134769] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2017-09-01 07:41:14.134790] E [MSGID: 101066] [graph.c:325:glusterfs_graph_init] 0-management: initializing translator failed [2017-09-01 07:41:14.134804] E [MSGID: 101176] [graph.c:681:glusterfs_graph_activate] 0-graph: init failed [2017-09-01 07:41:14.135723] W [glusterfsd.c:1332:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x55f22fab3abd] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) [0x55f22fab3961] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55f22fab2e4b] ) 0-: received signum (1), shutting down I have to wipe clean /var/lib/glusterd on rejected(10.5.6.17) peer and then can restart it, but.. I probe it anew and then "tier-enabled=0" lands in the "info" file for each vol on 10.5.6.17 and... vicious circle? On 01/09/17 07:30, Gaurav Yadav wrote:> Logs from newly added node helped me in RCA of the issue. > > Info file on node 10.5.6.17 consist of an additional > property "tier-enabled" which is not present in info file > from other 3 nodes, hence > when gluster peer probe call is made, in order to maintain > consistency across the cluster cksum is compared. In this > case as both files are different leading to different > cksum, causing state in? "State: Peer Rejected (Connected)". > > This inconsistency arise due to upgrade you did. > > Workaround: > 1.Go to node 10.5.6.17 > 2.Open info file from > "/var/lib/glusterd/vols/<vol-name>/info" and remove > "tier-enabled=0". > 3.Restart glusterd services > 4.Peer probe again. > > Thanks > Gaurav > > On Thu, Aug 31, 2017 at 3:37 PM, lejeczek > <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote: > > attached the lot as per your request. > > Would bee really great if you can find the root cause > of this and suggest a resolution. Fingers crossed. > thanks, L. > > On 31/08/17 05:34, Gaurav Yadav wrote: > > Could you please sendentire content of > "/var/lib/glusterd/" directory of the 4th node > which is being peer probed, along with > command-history and glusterd.logs. > > Thanks > Gaurav > > On Wed, Aug 30, 2017 at 7:10 PM, lejeczek > <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk> > <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>>> wrote: > > > > ? ? On 30/08/17 07:18, Gaurav Yadav wrote: > > > ? ? ? ? Could you please send me "info" file which is > ? ? ? ? placed in "/var/lib/glusterd/vols/<vol-name>" > ? ? ? ? directory from all the nodes along with > ? ? ? ? glusterd.logs and command-history. > > ? ? ? ? Thanks > ? ? ? ? Gaurav > > ? ? ? ? On Tue, Aug 29, 2017 at 7:13 PM, lejeczek > ? ? ? ? <peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk> > <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>> > ? ? ? ? <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk> > > ? ? ? ? <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>>>> wrote: > > ? ? ? ? ? ? hi fellas, > ? ? ? ? ? ? same old same > ? ? ? ? ? ? in log of the probing peer I see: > ? ? ? ? ? ? ... > ? ? ? ? ? ? 2017-08-29 13:36:16.882196] I [MSGID: > 106493] > > ? ? ? ? > [glusterd-handler.c:3020:__glusterd_handle_probe_query] > ? ? ? ? ? ? 0-glusterd: Responded to > priv.xx.xx.priv.xx.xx.x, > ? ? ? ? ? ? op_ret: 0, op_errno: 0, ret: 0 > ? ? ? ? ? ? [2017-08-29 13:36:16.904961] I [MSGID: > 106490] > > ? ? ? ? > [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] > ? ? ? ? ? ? 0-glusterd: Received probe from uuid: > ? ? ? ? ? ? 2a17edb4-ae68-4b67-916e-e38a2087ca28 > ? ? ? ? ? ? [2017-08-29 13:36:16.906477] E [MSGID: > 106010] > > ? ? ? ? > [glusterd-utils.c:3034:glusterd_compare_friend_volume] > ? ? ? ? ? ? 0-management: Version of Cksums CO-DATA > ? ? ? ? differ. local > ? ? ? ? ? ? cksum = 4088157353, remote cksum > 2870780063 > ? ? ? ? on peer > ? ? ? ? ? ? 10.5.6.17 > ? ? ? ? ? ? [2017-08-29 13:36:16.907187] I [MSGID: > 106493] > > ? ? ? ? > [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] > ? ? ? ? ? ? 0-glusterd: Responded to 10.5.6.17 > (0), ret: > ? ? ? ? 0, op_ret: -1 > ? ? ? ? ? ? ... > > ? ? ? ? ? ? Why would adding a new peer make > cluster jump > ? ? ? ? to check > ? ? ? ? ? ? checksums on a vol on that newly added > peer? > > > ? ? really. I mean, no brick even exists on newly > added > ? ? peer, it's just been probed, why this?: > > ? ? [2017-08-30 13:17:51.949430] E [MSGID: 106010] > ? ? > [glusterd-utils.c:3034:glusterd_compare_friend_volume] > ? ? 0-management: Version of Cksums CO-DATA > differ. local > ? ? cksum = 4088157353, remote cksum = 2870780063 > on peer > ? ? 10.5.6.17 > > ? ? 10.5.6.17 is a candidate I'm probing from a > working > ? ? cluster. > ? ? Why gluster wants checksums and why checksums > would be > ? ? different? > ? ? Would anybody know what is going on there? > > > ? ? ? ? ? ? Is it why the peer gets rejected? > ? ? ? ? ? ? That peer I'm hoping to add, was a > member of the > ? ? ? ? ? ? cluster in the past but I did "usual" > wipe of > ? ? ? ? ? ? /var/lib/gluster on candidate peer. > > ? ? ? ? ? ? a hint, solution would be great to hear. > ? ? ? ? ? ? L. > ? ? ? ? ? ? > _______________________________________________ > ? ? ? ? ? ? Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > ? ? ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > ? ? ? ? ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > ? ? ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > ? ? ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users>> > > ? ? ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > ? ? ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users>>> > > > > ? ? _______________________________________________ > ? ? Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users>> > > > >
I've also noticed - In case it's abnormal and might help - that when rejected peer get probed it is present only on the probing peer(though as rejected). Remaining two peers do not show rejected peer in the status, and rejected peer shows only the peer it got probed from. On 01/09/17 07:30, Gaurav Yadav wrote:> Logs from newly added node helped me in RCA of the issue. > > Info file on node 10.5.6.17 consist of an additional > property "tier-enabled" which is not present in info file > from other 3 nodes, hence > when gluster peer probe call is made, in order to maintain > consistency across the cluster cksum is compared. In this > case as both files are different leading to different > cksum, causing state in? "State: Peer Rejected (Connected)". > > This inconsistency arise due to upgrade you did. > > Workaround: > 1.Go to node 10.5.6.17 > 2.Open info file from > "/var/lib/glusterd/vols/<vol-name>/info" and remove > "tier-enabled=0". > 3.Restart glusterd services > 4.Peer probe again. > > Thanks > Gaurav > > On Thu, Aug 31, 2017 at 3:37 PM, lejeczek > <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote: > > attached the lot as per your request. > > Would bee really great if you can find the root cause > of this and suggest a resolution. Fingers crossed. > thanks, L. > > On 31/08/17 05:34, Gaurav Yadav wrote: > > Could you please sendentire content of > "/var/lib/glusterd/" directory of the 4th node > which is being peer probed, along with > command-history and glusterd.logs. > > Thanks > Gaurav > > On Wed, Aug 30, 2017 at 7:10 PM, lejeczek > <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk> > <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>>> wrote: > > > > ? ? On 30/08/17 07:18, Gaurav Yadav wrote: > > > ? ? ? ? Could you please send me "info" file which is > ? ? ? ? placed in "/var/lib/glusterd/vols/<vol-name>" > ? ? ? ? directory from all the nodes along with > ? ? ? ? glusterd.logs and command-history. > > ? ? ? ? Thanks > ? ? ? ? Gaurav > > ? ? ? ? On Tue, Aug 29, 2017 at 7:13 PM, lejeczek > ? ? ? ? <peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk> > <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>> > ? ? ? ? <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk> > > ? ? ? ? <mailto:peljasz at yahoo.co.uk > <mailto:peljasz at yahoo.co.uk>>>> wrote: > > ? ? ? ? ? ? hi fellas, > ? ? ? ? ? ? same old same > ? ? ? ? ? ? in log of the probing peer I see: > ? ? ? ? ? ? ... > ? ? ? ? ? ? 2017-08-29 13:36:16.882196] I [MSGID: > 106493] > > ? ? ? ? > [glusterd-handler.c:3020:__glusterd_handle_probe_query] > ? ? ? ? ? ? 0-glusterd: Responded to > priv.xx.xx.priv.xx.xx.x, > ? ? ? ? ? ? op_ret: 0, op_errno: 0, ret: 0 > ? ? ? ? ? ? [2017-08-29 13:36:16.904961] I [MSGID: > 106490] > > ? ? ? ? > [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] > ? ? ? ? ? ? 0-glusterd: Received probe from uuid: > ? ? ? ? ? ? 2a17edb4-ae68-4b67-916e-e38a2087ca28 > ? ? ? ? ? ? [2017-08-29 13:36:16.906477] E [MSGID: > 106010] > > ? ? ? ? > [glusterd-utils.c:3034:glusterd_compare_friend_volume] > ? ? ? ? ? ? 0-management: Version of Cksums CO-DATA > ? ? ? ? differ. local > ? ? ? ? ? ? cksum = 4088157353, remote cksum > 2870780063 > ? ? ? ? on peer > ? ? ? ? ? ? 10.5.6.17 > ? ? ? ? ? ? [2017-08-29 13:36:16.907187] I [MSGID: > 106493] > > ? ? ? ? > [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] > ? ? ? ? ? ? 0-glusterd: Responded to 10.5.6.17 > (0), ret: > ? ? ? ? 0, op_ret: -1 > ? ? ? ? ? ? ... > > ? ? ? ? ? ? Why would adding a new peer make > cluster jump > ? ? ? ? to check > ? ? ? ? ? ? checksums on a vol on that newly added > peer? > > > ? ? really. I mean, no brick even exists on newly > added > ? ? peer, it's just been probed, why this?: > > ? ? [2017-08-30 13:17:51.949430] E [MSGID: 106010] > ? ? > [glusterd-utils.c:3034:glusterd_compare_friend_volume] > ? ? 0-management: Version of Cksums CO-DATA > differ. local > ? ? cksum = 4088157353, remote cksum = 2870780063 > on peer > ? ? 10.5.6.17 > > ? ? 10.5.6.17 is a candidate I'm probing from a > working > ? ? cluster. > ? ? Why gluster wants checksums and why checksums > would be > ? ? different? > ? ? Would anybody know what is going on there? > > > ? ? ? ? ? ? Is it why the peer gets rejected? > ? ? ? ? ? ? That peer I'm hoping to add, was a > member of the > ? ? ? ? ? ? cluster in the past but I did "usual" > wipe of > ? ? ? ? ? ? /var/lib/gluster on candidate peer. > > ? ? ? ? ? ? a hint, solution would be great to hear. > ? ? ? ? ? ? L. > ? ? ? ? ? ? > _______________________________________________ > ? ? ? ? ? ? Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > ? ? ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > ? ? ? ? ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > ? ? ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > ? ? ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users>> > > ? ? ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > ? ? ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users>>> > > > > ? ? _______________________________________________ > ? ? Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > ? ? <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > ? ? > <http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users>> > > > >