thr3ads.net - Gluster users - [Gluster-users] State: Peer Rejected (Connected) [Aug 2017]

If this information is useful, please help other people find it:
Share via:

mabi

2017-Aug-06 06:59 UTC

[Gluster-users] State: Peer Rejected (Connected)

Hi,
I have a 3 nodes replica (including arbiter) volume with GlusterFS 3.8.11 and
this night one of my nodes (node1) had an out of memory for some unknown reason
and as such the Linux OOM killer has killed the glusterd and glusterfs process.
I restarted the glusterd process but now that node is in "Peer
Rejected" state from the other nodes and from itself it rejects the two
other nodes as you can see below from the output of "gluster peer
status":
Number of Peers: 2
Hostname: arbiternode.domain.tld
Uuid: 60a03a81-ba92-4b84-90fe-7b6e35a10975
State: Peer Rejected (Connected)
Hostname: node2.domain.tld
Uuid: 4834dceb-4356-4efb-ad8d-8baba44b967c
State: Peer Rejected (Connected)
I also rebooted my node1 just in case but that did not help.
I read here http://www.spinics.net/lists/gluster-users/msg25803.html that the
problem could have to do something with the volume info file, in my case I
checked the file:
/var/lib/glusterd/vols/myvolume/info
and they are the same on node1 and arbiternode but on node2 the order of the
following volume parameters are different:
features.quota-deem-statfs=on
features.inode-quota=on
nfs.disable=on
performance.readdir-ahead=on
Could that be the reason why the peer is in rejected status? can I simply edit
this file on node2 to re-order the parameters like on the other 2 nodes?
What else should I do to investigate the reason for this rejected peer state?
Thank you in advance for the help.
Best,
Mabi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170806/5dcd0a3a/attachment.html>

Ji-Hyeon Gim

2017-Aug-06 07:31 UTC

head link

[Gluster-users] State: Peer Rejected (Connected)

On 2017? 08? 06? 15:59, mabi wrote:> Hi,
>
> I have a 3 nodes replica (including arbiter) volume with GlusterFS
> 3.8.11 and this night one of my nodes (node1) had an out of memory for
> some unknown reason and as such the Linux OOM killer has killed the
> glusterd and glusterfs process. I restarted the glusterd process but
> now that node is in "Peer Rejected" state from the other nodes
and
> from itself it rejects the two other nodes as you can see below from
> the output of "gluster peer status":
>
> Number of Peers: 2
>
> Hostname: arbiternode.domain.tld
> Uuid: 60a03a81-ba92-4b84-90fe-7b6e35a10975
> State: Peer Rejected (Connected)
>
> Hostname: node2.domain.tld
> Uuid: 4834dceb-4356-4efb-ad8d-8baba44b967c
> State: Peer Rejected (Connected)
>
>
>
> I also rebooted my node1 just in case but that did not help.
>
> I read here http://www.spinics.net/lists/gluster-users/msg25803.html
> that the problem could have to do something with the volume info file,
> in my case I checked the file:
>
> /var/lib/glusterd/vols/myvolume/info
>
> and they are the same on node1 and arbiternode but on node2 the order
> of the following volume parameters are different:
>
> features.quota-deem-statfs=on
> features.inode-quota=on
> nfs.disable=on
> performance.readdir-ahead=on
>
> Could that be the reason why the peer is in rejected status? can I
> simply edit this file on node2 to re-order the parameters like on the
> other 2 nodes?
>
> What else should I do to investigate the reason for this rejected peer
> state?
>
> Thank you in advance for the help.
>
> Best,
> Mabi
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
Hi mabi.

In my opinion, It caused by some volfile/checksum mismatch. try to look
glusterd log file(/var/log/glusterfs/glusterd.log) in REJECTED node, and
find some log like below

[2014-06-17 04:21:11.266398] I
[glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 81857e74-a726-4f48-8d1b-c2a4bdbc094f
[2014-06-17 04:21:11.266485] E
[glusterd-utils.c:2373:glusterd_compare_friend_volume] 0-management: Cksums of
volume supportgfs differ. local cksum = 52468988, remote cksum = 2201279699 on
peer 172.26.178.254
[2014-06-17 04:21:11.266542] I
[glusterd-handler.c:3085:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to
172.26.178.254 (0), ret: 0
[2014-06-17 04:21:11.272206] I
[glusterd-rpc-ops.c:356:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from
uuid: 81857e74-a726-4f48-8d1b-c2a4bdbc094f, host: 172.26.178.254, port: 0

if it is, you need to sync volfile files/directories under
/var/lib/glusterd/vols/<VOLNAME> from one of GOOD nodes.

for details to resolve this problem, please show more information such
as glusterd log :)

-- 
Best regards.

--

Ji-Hyeon Gim
Research Engineer, Gluesys

Address. Gluesys R&D Center, 5F, 11-31, Simin-daero 327beon-gil,
         Dongan-gu, Anyang-si,
         Gyeonggi-do, Korea
         (14055)
Phone.   +82-70-8787-1053
Fax.     +82-31-388-3261
Mobile.  +82-10-7293-8858
E-Mail.  potatogim at potatogim.net
Website. www.potatogim.net

The time I wasted today is the tomorrow the dead man was eager to see yesterday.
  - Sophocles


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 520 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170806/8223c2aa/attachment.sig>

mabi

2017-Aug-06 08:26 UTC

head link

[Gluster-users] State: Peer Rejected (Connected)

Hi Ji-Hyeon,
Thanks to your help I could find out the problematic file. This would be the
quota file of my volume it has a different checksum on node1 whereas node2 and
arbiternode have the same checksum. This is expected as I had issues which my
quota file and had to fix it manually with a script (more details on this
mailing list in a previous post) and I only did that on node1.
So what I now did is to copy /var/lib/glusterd/vols/myvolume/quota.conf file
from node1 to node2 and arbiternode and then restart the glusterd process on
node1 but somehow this did not fix the issue. I suppose I am missing a step here
and maybe you have an idea what?
Here would be the relevant part of my glusterd.log file taken from node1:
[2017-08-06 08:16:57.699131] E [MSGID: 106012]
[glusterd-utils.c:2988:glusterd_compare_friend_volume] 0-management: Cksums of
quota configuration of volume myvolume differ. local cksum = 3823389269, remote
cksum = 733515336 on peer node2.domain.tld
[2017-08-06 08:16:57.275558] E [MSGID: 106012]
[glusterd-utils.c:2988:glusterd_compare_friend_volume] 0-management: Cksums of
quota configuration of volume myvolume differ. local cksum = 3823389269, remote
cksum = 733515336 on peer arbiternode.intra.oriented.ch
Best regards,
Mabi
> -------- Original Message --------
> Subject: Re: [Gluster-users] State: Peer Rejected (Connected)
> Local Time: August 6, 2017 9:31 AM
> UTC Time: August 6, 2017 7:31 AM
> From: potatogim at potatogim.net
> To: mabi <mabi at protonmail.ch>
> Gluster Users <gluster-users at gluster.org>
> On 2017? 08? 06? 15:59, mabi wrote:
>> Hi,
>>
>> I have a 3 nodes replica (including arbiter) volume with GlusterFS
>> 3.8.11 and this night one of my nodes (node1) had an out of memory for
>> some unknown reason and as such the Linux OOM killer has killed the
>> glusterd and glusterfs process. I restarted the glusterd process but
>> now that node is in "Peer Rejected" state from the other
nodes and
>> from itself it rejects the two other nodes as you can see below from
>> the output of "gluster peer status":
>>
>> Number of Peers: 2
>>
>> Hostname: arbiternode.domain.tld
>> Uuid: 60a03a81-ba92-4b84-90fe-7b6e35a10975
>> State: Peer Rejected (Connected)
>>
>> Hostname: node2.domain.tld
>> Uuid: 4834dceb-4356-4efb-ad8d-8baba44b967c
>> State: Peer Rejected (Connected)
>>
>>
>>
>> I also rebooted my node1 just in case but that did not help.
>>
>> I read here http://www.spinics.net/lists/gluster-users/msg25803.html
>> that the problem could have to do something with the volume info file,
>> in my case I checked the file:
>>
>> /var/lib/glusterd/vols/myvolume/info
>>
>> and they are the same on node1 and arbiternode but on node2 the order
>> of the following volume parameters are different:
>>
>> features.quota-deem-statfs=on
>> features.inode-quota=on
>> nfs.disable=on
>> performance.readdir-ahead=on
>>
>> Could that be the reason why the peer is in rejected status? can I
>> simply edit this file on node2 to re-order the parameters like on the
>> other 2 nodes?
>>
>> What else should I do to investigate the reason for this rejected peer
>> state?
>>
>> Thank you in advance for the help.
>>
>> Best,
>> Mabi
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
> Hi mabi.
> In my opinion, It caused by some volfile/checksum mismatch. try to look
> glusterd log file(/var/log/glusterfs/glusterd.log) in REJECTED node, and
> find some log like below
> [2014-06-17 04:21:11.266398] I
[glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 81857e74-a726-4f48-8d1b-c2a4bdbc094f
> [2014-06-17 04:21:11.266485] E
[glusterd-utils.c:2373:glusterd_compare_friend_volume] 0-management: Cksums of
volume supportgfs differ. local cksum = 52468988, remote cksum = 2201279699 on
peer 172.26.178.254
> [2014-06-17 04:21:11.266542] I
[glusterd-handler.c:3085:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to
172.26.178.254 (0), ret: 0
> [2014-06-17 04:21:11.272206] I
[glusterd-rpc-ops.c:356:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from
uuid: 81857e74-a726-4f48-8d1b-c2a4bdbc094f, host: 172.26.178.254, port: 0
> if it is, you need to sync volfile files/directories under
> /var/lib/glusterd/vols/<VOLNAME> from one of GOOD nodes.
> for details to resolve this problem, please show more information such
> as glusterd log :)
> --
> Best regards.
> --
> Ji-Hyeon Gim
> Research Engineer, Gluesys
> Address. Gluesys R&D Center, 5F, 11-31, Simin-daero 327beon-gil,
> Dongan-gu, Anyang-si,
> Gyeonggi-do, Korea
> (14055)
> Phone. +82-70-8787-1053
> Fax. +82-31-388-3261
> Mobile. +82-10-7293-8858
> E-Mail. potatogim at potatogim.net
> Website. www.potatogim.net
> The time I wasted today is the tomorrow the dead man was eager to see
yesterday.
> - Sophocles-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170806/691afe63/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

Gluster users - Aug 2017 - State: Peer Rejected (Connected)

[Gluster-users] State: Peer Rejected (Connected)

[Gluster-users] State: Peer Rejected (Connected)

[Gluster-users] State: Peer Rejected (Connected)

Seemingly Similar Threads