Atin Mukherjee
2016-Nov-21 08:58 UTC
[Gluster-users] Duplicate UUID entries in "gluster peer status" command
On Mon, Nov 21, 2016 at 10:00 AM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com> wrote:> Hi Atin, > > System is the embedded system and these dates are before the system get in > timer sync. > > Yes, I have also seen these two files in peers directory on 002500 board > and I want to know the reason why gluster creates the second file when > there is old file is exist. Even when you see the content of the these file > are same. > > Is it possible for gluster if we fall in this situation then instead of > manually doing the steps which you mentioned above gluster will take care > of this? >We shouldn't have any unwanted data in /var/lib/glusterd at first place and that's a prerequisite of gluster installation failing which inconsistencies of configuration data can't be handled automatically until manual intervention.> > I have some questions: > > 1. based on the logs can we find out the reason for having two peers files > with same contents. >No we can't as the log file doesn't have any entry of 26ae19a6-b58f-446a-b079-411d4ee57450 which indicates that this entry is a stale one and was (is) there since long time and the log files are the latest. 2. is there any way to do it from gluster code.>Ditto as above.> > Regards, > Abhishek > > Regards, > Abhishek > > On Mon, Nov 21, 2016 at 9:52 AM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> atin at dhcp35-96:~/Downloads/gluster_users/abhishek_dup_uuid/ >> duplicate_uuid/glusterd_2500/peers$ ls -lrt >> total 8 >> -rw-------. 1 atin wheel 71 *Jan 1 1970* 5be8603b-18d0-4333-8590-38f918 >> a22857 >> -rw-------. 1 atin wheel 71 Nov 18 03:31 26ae19a6-b58f-446a-b079-411d4e >> e57450 >> >> In board 2500 look at the date of the file 5be8603b-18d0-4333-8590-38f918a22857 >> (marked in bold). Not sure how did you end up having this file in such time >> stamp. I am guessing this could be because of the set up been not cleaned >> properly at the time of re-installation. >> >> Here is the steps what I'd recommend for now: >> >> 1. rename 26ae19a6-b58f-446a-b079-411d4ee57450 to >> 5be8603b-18d0-4333-8590-38f918a22857, you should have only one entry in >> the peers folder in board 2500. >> 2. Bring down both glusterd instances >> 3. Bring back one by one >> >> And then restart glusterd to see if the issue persists. >> >> >> >> On Mon, Nov 21, 2016 at 9:34 AM, ABHISHEK PALIWAL < >> abhishpaliwal at gmail.com> wrote: >> >>> Hope you will see in the logs...... >>> >>> On Mon, Nov 21, 2016 at 9:17 AM, ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Hi Atin, >>>> >>>> It is not getting wipe off we have changed the configuration path from >>>> /var/lib/glusterd to /system/glusterd. >>>> >>>> So, they will remain as same as previous. >>>> >>>> On Mon, Nov 21, 2016 at 9:15 AM, Atin Mukherjee <amukherj at redhat.com> >>>> wrote: >>>> >>>>> Abhishek, >>>>> >>>>> rebooting the board does wipe of /var/lib/glusterd contents in your >>>>> set up right (as per my earlier conversation with you) ? In that case, how >>>>> are you ensuring that the same node gets back the older UUID? If you don't >>>>> then this is bound to happen. >>>>> >>>>> On Mon, Nov 21, 2016 at 9:11 AM, ABHISHEK PALIWAL < >>>>> abhishpaliwal at gmail.com> wrote: >>>>> >>>>>> Hi Team, >>>>>> >>>>>> Please lookinto this problem as this is very widely seen problem in >>>>>> our system. >>>>>> >>>>>> We are having the setup of replicate volume setup with two brick but >>>>>> after restarting the second board I am getting the duplicate entry in >>>>>> "gluster peer status" command like below: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *# gluster peer status Number of Peers: 2 Hostname: 10.32.0.48 Uuid: >>>>>> 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster (Connected) >>>>>> Hostname: 10.32.0.48 Uuid: 5be8603b-18d0-4333-8590-38f918a22857 State: >>>>>> Peer in Cluster (Connected) # * >>>>>> >>>>>> I am attaching all logs from both the boards and the command outputs >>>>>> as well. >>>>>> >>>>>> So could you please check what is the reason to get in this situation >>>>>> as it is very frequent in multiple case. >>>>>> >>>>>> Also, we are not replacing any board from setup just rebooting. >>>>>> >>>>>> -- >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ~ Atin (atinm) >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> >> >> -- >> >> ~ Atin (atinm) >> > > > > -- > > > > > Regards > Abhishek Paliwal >-- ~ Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161121/6b42cfa1/attachment.html>
ABHISHEK PALIWAL
2016-Nov-21 09:17 UTC
[Gluster-users] Duplicate UUID entries in "gluster peer status" command
On Mon, Nov 21, 2016 at 2:28 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Mon, Nov 21, 2016 at 10:00 AM, ABHISHEK PALIWAL < > abhishpaliwal at gmail.com> wrote: > >> Hi Atin, >> >> System is the embedded system and these dates are before the system get >> in timer sync. >> >> Yes, I have also seen these two files in peers directory on 002500 board >> and I want to know the reason why gluster creates the second file when >> there is old file is exist. Even when you see the content of the these file >> are same. >> >> Is it possible for gluster if we fall in this situation then instead of >> manually doing the steps which you mentioned above gluster will take care >> of this? >> > > We shouldn't have any unwanted data in /var/lib/glusterd at first place > and that's a prerequisite of gluster installation failing which > inconsistencies of configuration data can't be handled automatically until > manual intervention. > >it means before starting of gluster installation /var/lib/glusterd always we empty? because in this case nothing is unwanted before installing the glusterd.> >> I have some questions: >> >> 1. based on the logs can we find out the reason for having two peers >> files with same contents. >> > > No we can't as the log file doesn't have any entry of > 26ae19a6-b58f-446a-b079-411d4ee57450 which indicates that this entry is a > stale one and was (is) there since long time and the log files are the > latest. >I agreed this 26ae19a6-b58f-446a-b079-411d4ee57450 entry is not there but as we checked this file is newer in peer and 5be8603b-18d0-4333-8590-38f918a22857 is the older file *.* Also, below are some more logs in etc-glusterfs-glusterd.log file from 002500 board file The message "I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <10.32.0.48> (<5be8603b-18d0-4333-8590-38f918a22857>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 3 times between [2016-11-17 22:01:23.542556] and [2016-11-17 22:01:36.993584] The message "W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for c_glusterfs" repeated 3 times between [2016-11-17 22:01:23.542973] and [2016-11-17 22:01:36.993855] [2016-11-17 22:01:48.860555] I [MSGID: 106487] [glusterd-handler.c:1411:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2016-11-17 22:01:49.137733] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30706 [2016-11-17 22:01:49.240986] I [MSGID: 106493] [glusterd-rpc-ops.c:694:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 5be8603b-18d0-4333-8590-38f918a22857 [2016-11-17 22:11:58.658884] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x15 sent 2016-11-17 22:01:48.945424. timeout = 600 for 10.32.0.48:24007 [2016-11-17 22:11:58.658987] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details. [2016-11-17 22:11:58.659243] I [socket.c:3382:socket_submit_reply] 0-socket.management: not connected (priv->connected = 255) [2016-11-17 22:11:58.659265] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2016-11-17 22:11:58.659305] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed [2016-11-17 22:13:58.674343] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x11 sent 2016-11-17 22:03:50.268751. timeout = 600 for 10.32.0.48:24007 [2016-11-17 22:13:58.674414] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details. [2016-11-17 22:13:58.674604] I [socket.c:3382:socket_submit_reply] 0-socket.management: not connected (priv->connected = 255) [2016-11-17 22:13:58.674627] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2016-11-17 22:13:58.674667] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed [2016-11-17 22:15:58.687737] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x17 sent 2016-11-17 22:05:51.341614. timeout = 600 for 10.32.0.48:24007 is these logs causing duplicate UUID or duplicate UUID causing this?> > 2. is there any way to do it from gluster code. >> > > Ditto as above. > > >> >> Regards, >> Abhishek >> >> Regards, >> Abhishek >> >> On Mon, Nov 21, 2016 at 9:52 AM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> atin at dhcp35-96:~/Downloads/gluster_users/abhishek_dup_uuid/d >>> uplicate_uuid/glusterd_2500/peers$ ls -lrt >>> total 8 >>> -rw-------. 1 atin wheel 71 *Jan 1 1970* >>> 5be8603b-18d0-4333-8590-38f918a22857 >>> -rw-------. 1 atin wheel 71 Nov 18 03:31 26ae19a6-b58f-446a-b079-411d4e >>> e57450 >>> >>> In board 2500 look at the date of the file 5be8603b-18d0-4333-8590-38f918a22857 >>> (marked in bold). Not sure how did you end up having this file in such time >>> stamp. I am guessing this could be because of the set up been not cleaned >>> properly at the time of re-installation. >>> >>> Here is the steps what I'd recommend for now: >>> >>> 1. rename 26ae19a6-b58f-446a-b079-411d4ee57450 to >>> 5be8603b-18d0-4333-8590-38f918a22857, you should have only one entry in >>> the peers folder in board 2500. >>> 2. Bring down both glusterd instances >>> 3. Bring back one by one >>> >>> And then restart glusterd to see if the issue persists. >>> >>> >>> >>> On Mon, Nov 21, 2016 at 9:34 AM, ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Hope you will see in the logs...... >>>> >>>> On Mon, Nov 21, 2016 at 9:17 AM, ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hi Atin, >>>>> >>>>> It is not getting wipe off we have changed the configuration path from >>>>> /var/lib/glusterd to /system/glusterd. >>>>> >>>>> So, they will remain as same as previous. >>>>> >>>>> On Mon, Nov 21, 2016 at 9:15 AM, Atin Mukherjee <amukherj at redhat.com> >>>>> wrote: >>>>> >>>>>> Abhishek, >>>>>> >>>>>> rebooting the board does wipe of /var/lib/glusterd contents in your >>>>>> set up right (as per my earlier conversation with you) ? In that case, how >>>>>> are you ensuring that the same node gets back the older UUID? If you don't >>>>>> then this is bound to happen. >>>>>> >>>>>> On Mon, Nov 21, 2016 at 9:11 AM, ABHISHEK PALIWAL < >>>>>> abhishpaliwal at gmail.com> wrote: >>>>>> >>>>>>> Hi Team, >>>>>>> >>>>>>> Please lookinto this problem as this is very widely seen problem in >>>>>>> our system. >>>>>>> >>>>>>> We are having the setup of replicate volume setup with two brick but >>>>>>> after restarting the second board I am getting the duplicate entry in >>>>>>> "gluster peer status" command like below: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *# gluster peer status Number of Peers: 2 Hostname: 10.32.0.48 >>>>>>> Uuid: 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster >>>>>>> (Connected) Hostname: 10.32.0.48 Uuid: >>>>>>> 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster (Connected) # * >>>>>>> >>>>>>> I am attaching all logs from both the boards and the command outputs >>>>>>> as well. >>>>>>> >>>>>>> So could you please check what is the reason to get in this >>>>>>> situation as it is very frequent in multiple case. >>>>>>> >>>>>>> Also, we are not replacing any board from setup just rebooting. >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Regards >>>>>>> Abhishek Paliwal >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ~ Atin (atinm) >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > > -- > > ~ Atin (atinm) >-- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161121/b7b28707/attachment.html>