ABHISHEK PALIWAL
2016-Nov-21 09:17 UTC
[Gluster-users] Duplicate UUID entries in "gluster peer status" command
On Mon, Nov 21, 2016 at 2:28 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> > > On Mon, Nov 21, 2016 at 10:00 AM, ABHISHEK PALIWAL < > abhishpaliwal at gmail.com> wrote: > >> Hi Atin, >> >> System is the embedded system and these dates are before the system get >> in timer sync. >> >> Yes, I have also seen these two files in peers directory on 002500 board >> and I want to know the reason why gluster creates the second file when >> there is old file is exist. Even when you see the content of the these file >> are same. >> >> Is it possible for gluster if we fall in this situation then instead of >> manually doing the steps which you mentioned above gluster will take care >> of this? >> > > We shouldn't have any unwanted data in /var/lib/glusterd at first place > and that's a prerequisite of gluster installation failing which > inconsistencies of configuration data can't be handled automatically until > manual intervention. > >it means before starting of gluster installation /var/lib/glusterd always we empty? because in this case nothing is unwanted before installing the glusterd.> >> I have some questions: >> >> 1. based on the logs can we find out the reason for having two peers >> files with same contents. >> > > No we can't as the log file doesn't have any entry of > 26ae19a6-b58f-446a-b079-411d4ee57450 which indicates that this entry is a > stale one and was (is) there since long time and the log files are the > latest. >I agreed this 26ae19a6-b58f-446a-b079-411d4ee57450 entry is not there but as we checked this file is newer in peer and 5be8603b-18d0-4333-8590-38f918a22857 is the older file *.* Also, below are some more logs in etc-glusterfs-glusterd.log file from 002500 board file The message "I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <10.32.0.48> (<5be8603b-18d0-4333-8590-38f918a22857>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 3 times between [2016-11-17 22:01:23.542556] and [2016-11-17 22:01:36.993584] The message "W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] 0-management: Lock not released for c_glusterfs" repeated 3 times between [2016-11-17 22:01:23.542973] and [2016-11-17 22:01:36.993855] [2016-11-17 22:01:48.860555] I [MSGID: 106487] [glusterd-handler.c:1411:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2016-11-17 22:01:49.137733] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30706 [2016-11-17 22:01:49.240986] I [MSGID: 106493] [glusterd-rpc-ops.c:694:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 5be8603b-18d0-4333-8590-38f918a22857 [2016-11-17 22:11:58.658884] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x15 sent 2016-11-17 22:01:48.945424. timeout = 600 for 10.32.0.48:24007 [2016-11-17 22:11:58.658987] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details. [2016-11-17 22:11:58.659243] I [socket.c:3382:socket_submit_reply] 0-socket.management: not connected (priv->connected = 255) [2016-11-17 22:11:58.659265] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2016-11-17 22:11:58.659305] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed [2016-11-17 22:13:58.674343] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x11 sent 2016-11-17 22:03:50.268751. timeout = 600 for 10.32.0.48:24007 [2016-11-17 22:13:58.674414] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details. [2016-11-17 22:13:58.674604] I [socket.c:3382:socket_submit_reply] 0-socket.management: not connected (priv->connected = 255) [2016-11-17 22:13:58.674627] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2016-11-17 22:13:58.674667] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] 0-glusterd: Reply submission failed [2016-11-17 22:15:58.687737] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x17 sent 2016-11-17 22:05:51.341614. timeout = 600 for 10.32.0.48:24007 is these logs causing duplicate UUID or duplicate UUID causing this?> > 2. is there any way to do it from gluster code. >> > > Ditto as above. > > >> >> Regards, >> Abhishek >> >> Regards, >> Abhishek >> >> On Mon, Nov 21, 2016 at 9:52 AM, Atin Mukherjee <amukherj at redhat.com> >> wrote: >> >>> atin at dhcp35-96:~/Downloads/gluster_users/abhishek_dup_uuid/d >>> uplicate_uuid/glusterd_2500/peers$ ls -lrt >>> total 8 >>> -rw-------. 1 atin wheel 71 *Jan 1 1970* >>> 5be8603b-18d0-4333-8590-38f918a22857 >>> -rw-------. 1 atin wheel 71 Nov 18 03:31 26ae19a6-b58f-446a-b079-411d4e >>> e57450 >>> >>> In board 2500 look at the date of the file 5be8603b-18d0-4333-8590-38f918a22857 >>> (marked in bold). Not sure how did you end up having this file in such time >>> stamp. I am guessing this could be because of the set up been not cleaned >>> properly at the time of re-installation. >>> >>> Here is the steps what I'd recommend for now: >>> >>> 1. rename 26ae19a6-b58f-446a-b079-411d4ee57450 to >>> 5be8603b-18d0-4333-8590-38f918a22857, you should have only one entry in >>> the peers folder in board 2500. >>> 2. Bring down both glusterd instances >>> 3. Bring back one by one >>> >>> And then restart glusterd to see if the issue persists. >>> >>> >>> >>> On Mon, Nov 21, 2016 at 9:34 AM, ABHISHEK PALIWAL < >>> abhishpaliwal at gmail.com> wrote: >>> >>>> Hope you will see in the logs...... >>>> >>>> On Mon, Nov 21, 2016 at 9:17 AM, ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hi Atin, >>>>> >>>>> It is not getting wipe off we have changed the configuration path from >>>>> /var/lib/glusterd to /system/glusterd. >>>>> >>>>> So, they will remain as same as previous. >>>>> >>>>> On Mon, Nov 21, 2016 at 9:15 AM, Atin Mukherjee <amukherj at redhat.com> >>>>> wrote: >>>>> >>>>>> Abhishek, >>>>>> >>>>>> rebooting the board does wipe of /var/lib/glusterd contents in your >>>>>> set up right (as per my earlier conversation with you) ? In that case, how >>>>>> are you ensuring that the same node gets back the older UUID? If you don't >>>>>> then this is bound to happen. >>>>>> >>>>>> On Mon, Nov 21, 2016 at 9:11 AM, ABHISHEK PALIWAL < >>>>>> abhishpaliwal at gmail.com> wrote: >>>>>> >>>>>>> Hi Team, >>>>>>> >>>>>>> Please lookinto this problem as this is very widely seen problem in >>>>>>> our system. >>>>>>> >>>>>>> We are having the setup of replicate volume setup with two brick but >>>>>>> after restarting the second board I am getting the duplicate entry in >>>>>>> "gluster peer status" command like below: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *# gluster peer status Number of Peers: 2 Hostname: 10.32.0.48 >>>>>>> Uuid: 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster >>>>>>> (Connected) Hostname: 10.32.0.48 Uuid: >>>>>>> 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster (Connected) # * >>>>>>> >>>>>>> I am attaching all logs from both the boards and the command outputs >>>>>>> as well. >>>>>>> >>>>>>> So could you please check what is the reason to get in this >>>>>>> situation as it is very frequent in multiple case. >>>>>>> >>>>>>> Also, we are not replacing any board from setup just rebooting. >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Regards >>>>>>> Abhishek Paliwal >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ~ Atin (atinm) >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Regards >>>> Abhishek Paliwal >>>> >>> >>> >>> >>> -- >>> >>> ~ Atin (atinm) >>> >> >> >> >> -- >> >> >> >> >> Regards >> Abhishek Paliwal >> > > > > -- > > ~ Atin (atinm) >-- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161121/b7b28707/attachment.html>
ABHISHEK PALIWAL
2016-Nov-21 10:10 UTC
[Gluster-users] Duplicate UUID entries in "gluster peer status" command
I have another set of logs for this problem and in this we don't have time stamp problem for this but we are getting same UUID duplicate entries on BoardB ls -lart total 16 drwxrwxr-x 12 abhishek abhishek 4096 Oct 3 08:58 .. drwxrwxr-x 2 abhishek abhishek 4096 Oct 3 08:58 . -rw-rw-r-- 1 abhishek abhishek 71 Oct 3 13:36 d8f66ce8-4154-4246-8084-c63b9cfc1af4 -rw-rw-r-- 1 abhishek abhishek 71 Oct 3 13:36 28d37300-425a-4781-b1e0-c13efa2ceee6 Previously also I provided logs to you. Again I am attaching logs for you to analyze. I just want to know why this another peer file is getting created when there is not entry in logs for it. Regards, Abhishek On Mon, Nov 21, 2016 at 2:47 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com> wrote:> > > On Mon, Nov 21, 2016 at 2:28 PM, Atin Mukherjee <amukherj at redhat.com> > wrote: > >> >> >> On Mon, Nov 21, 2016 at 10:00 AM, ABHISHEK PALIWAL < >> abhishpaliwal at gmail.com> wrote: >> >>> Hi Atin, >>> >>> System is the embedded system and these dates are before the system get >>> in timer sync. >>> >>> Yes, I have also seen these two files in peers directory on 002500 board >>> and I want to know the reason why gluster creates the second file when >>> there is old file is exist. Even when you see the content of the these file >>> are same. >>> >>> Is it possible for gluster if we fall in this situation then instead of >>> manually doing the steps which you mentioned above gluster will take care >>> of this? >>> >> >> We shouldn't have any unwanted data in /var/lib/glusterd at first place >> and that's a prerequisite of gluster installation failing which >> inconsistencies of configuration data can't be handled automatically until >> manual intervention. >> >> > it means before starting of gluster installation /var/lib/glusterd always > we empty? because in this case nothing is unwanted before installing the > glusterd. > >> >>> I have some questions: >>> >>> 1. based on the logs can we find out the reason for having two peers >>> files with same contents. >>> >> >> No we can't as the log file doesn't have any entry of >> 26ae19a6-b58f-446a-b079-411d4ee57450 which indicates that this entry is >> a stale one and was (is) there since long time and the log files are the >> latest. >> > > I agreed this 26ae19a6-b58f-446a-b079-411d4ee57450 entry is not there but > as we checked this file is newer in peer and 5be8603b-18d0-4333-8590-38f918a22857 > is the older file > > *.* > Also, below are some more logs in etc-glusterfs-glusterd.log file from > 002500 board file > > The message "I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] > 0-management: Peer <10.32.0.48> (<5be8603b-18d0-4333-8590-38f918a22857>), > in state <Peer in Cluster>, has disconnected from glusterd." repeated 3 > times between [2016-11-17 22:01:23.542556] and [2016-11-17 22:01:36.993584] > The message "W [MSGID: 106118] [glusterd-handler.c:5087:__glusterd_peer_rpc_notify] > 0-management: Lock not released for c_glusterfs" repeated 3 times between > [2016-11-17 22:01:23.542973] and [2016-11-17 22:01:36.993855] > [2016-11-17 22:01:48.860555] I [MSGID: 106487] [glusterd-handler.c:1411:__ > glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req > [2016-11-17 22:01:49.137733] I [MSGID: 106163] > [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 30706 > [2016-11-17 22:01:49.240986] I [MSGID: 106493] [glusterd-rpc-ops.c:694:__glusterd_friend_update_cbk] > 0-management: Received ACC from uuid: 5be8603b-18d0-4333-8590-38f918a22857 > [2016-11-17 22:11:58.658884] E [rpc-clnt.c:201:call_bail] 0-management: > bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x15 sent > 2016-11-17 22:01:48.945424. timeout = 600 for 10.32.0.48:24007 > [2016-11-17 22:11:58.658987] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] > 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details. > [2016-11-17 22:11:58.659243] I [socket.c:3382:socket_submit_reply] > 0-socket.management: not connected (priv->connected = 255) > [2016-11-17 22:11:58.659265] E [rpcsvc.c:1314:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc > cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) > [2016-11-17 22:11:58.659305] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] > 0-glusterd: Reply submission failed > [2016-11-17 22:13:58.674343] E [rpc-clnt.c:201:call_bail] 0-management: > bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x11 sent > 2016-11-17 22:03:50.268751. timeout = 600 for 10.32.0.48:24007 > [2016-11-17 22:13:58.674414] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] > 0-glusterd: Staging failed on 10.32.0.48. Please check log file for details. > [2016-11-17 22:13:58.674604] I [socket.c:3382:socket_submit_reply] > 0-socket.management: not connected (priv->connected = 255) > [2016-11-17 22:13:58.674627] E [rpcsvc.c:1314:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc > cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) > [2016-11-17 22:13:58.674667] E [MSGID: 106430] [glusterd-utils.c:400:glusterd_submit_reply] > 0-glusterd: Reply submission failed > [2016-11-17 22:15:58.687737] E [rpc-clnt.c:201:call_bail] 0-management: > bailing out frame type(glusterd mgmt) op(--(3)) xid = 0x17 sent > 2016-11-17 22:05:51.341614. timeout = 600 for 10.32.0.48:24007 > > is these logs causing duplicate UUID or duplicate UUID causing this? > >> >> 2. is there any way to do it from gluster code. >>> >> >> Ditto as above. >> >> >>> >>> Regards, >>> Abhishek >>> >>> Regards, >>> Abhishek >>> >>> On Mon, Nov 21, 2016 at 9:52 AM, Atin Mukherjee <amukherj at redhat.com> >>> wrote: >>> >>>> atin at dhcp35-96:~/Downloads/gluster_users/abhishek_dup_uuid/d >>>> uplicate_uuid/glusterd_2500/peers$ ls -lrt >>>> total 8 >>>> -rw-------. 1 atin wheel 71 *Jan 1 1970* >>>> 5be8603b-18d0-4333-8590-38f918a22857 >>>> -rw-------. 1 atin wheel 71 Nov 18 03:31 26ae19a6-b58f-446a-b079-411d4e >>>> e57450 >>>> >>>> In board 2500 look at the date of the file >>>> 5be8603b-18d0-4333-8590-38f918a22857 (marked in bold). Not sure how >>>> did you end up having this file in such time stamp. I am guessing this >>>> could be because of the set up been not cleaned properly at the time of >>>> re-installation. >>>> >>>> Here is the steps what I'd recommend for now: >>>> >>>> 1. rename 26ae19a6-b58f-446a-b079-411d4ee57450 to >>>> 5be8603b-18d0-4333-8590-38f918a22857, you should have only one entry >>>> in the peers folder in board 2500. >>>> 2. Bring down both glusterd instances >>>> 3. Bring back one by one >>>> >>>> And then restart glusterd to see if the issue persists. >>>> >>>> >>>> >>>> On Mon, Nov 21, 2016 at 9:34 AM, ABHISHEK PALIWAL < >>>> abhishpaliwal at gmail.com> wrote: >>>> >>>>> Hope you will see in the logs...... >>>>> >>>>> On Mon, Nov 21, 2016 at 9:17 AM, ABHISHEK PALIWAL < >>>>> abhishpaliwal at gmail.com> wrote: >>>>> >>>>>> Hi Atin, >>>>>> >>>>>> It is not getting wipe off we have changed the configuration path >>>>>> from /var/lib/glusterd to /system/glusterd. >>>>>> >>>>>> So, they will remain as same as previous. >>>>>> >>>>>> On Mon, Nov 21, 2016 at 9:15 AM, Atin Mukherjee <amukherj at redhat.com> >>>>>> wrote: >>>>>> >>>>>>> Abhishek, >>>>>>> >>>>>>> rebooting the board does wipe of /var/lib/glusterd contents in your >>>>>>> set up right (as per my earlier conversation with you) ? In that case, how >>>>>>> are you ensuring that the same node gets back the older UUID? If you don't >>>>>>> then this is bound to happen. >>>>>>> >>>>>>> On Mon, Nov 21, 2016 at 9:11 AM, ABHISHEK PALIWAL < >>>>>>> abhishpaliwal at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Team, >>>>>>>> >>>>>>>> Please lookinto this problem as this is very widely seen problem in >>>>>>>> our system. >>>>>>>> >>>>>>>> We are having the setup of replicate volume setup with two brick >>>>>>>> but after restarting the second board I am getting the duplicate entry in >>>>>>>> "gluster peer status" command like below: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *# gluster peer status Number of Peers: 2 Hostname: 10.32.0.48 >>>>>>>> Uuid: 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster >>>>>>>> (Connected) Hostname: 10.32.0.48 Uuid: >>>>>>>> 5be8603b-18d0-4333-8590-38f918a22857 State: Peer in Cluster (Connected) # * >>>>>>>> >>>>>>>> I am attaching all logs from both the boards and the command >>>>>>>> outputs as well. >>>>>>>> >>>>>>>> So could you please check what is the reason to get in this >>>>>>>> situation as it is very frequent in multiple case. >>>>>>>> >>>>>>>> Also, we are not replacing any board from setup just rebooting. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Regards >>>>>>>> Abhishek Paliwal >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ~ Atin (atinm) >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> Abhishek Paliwal >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Regards >>>>> Abhishek Paliwal >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> ~ Atin (atinm) >>>> >>> >>> >>> >>> -- >>> >>> >>> >>> >>> Regards >>> Abhishek Paliwal >>> >> >> >> >> -- >> >> ~ Atin (atinm) >> > > > > -- > > > > > Regards > Abhishek Paliwal >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161121/ebbb194c/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.tar.gz Type: application/x-gzip Size: 160108 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161121/ebbb194c/attachment-0001.gz>