songxin
2016-Feb-17 12:44 UTC
[Gluster-users] question about sync replicate volume after rebooting one node
Do you mean that I will delete the info file on B node and then start the glusterd?Or copy it from A node to B node? ???? iPhone> ? 2016?2?17??14:59?Atin Mukherjee <amukherj at redhat.com> ??? > > > >> On 02/17/2016 11:44 AM, songxin wrote: >> Hi?? >> The version of glusterfs on A node and B node are both 3.7.6. >> The time on B node is same after rebooting because B node hasn't RTC. >> Does it cause the problem? >> >> If I run " gluster volume start gv0 force " the glusterfsd can be >> started but "gluster volume start gv0" don't work. >> >> The file /var/lib/glusterd/vols/gv0/info on B node as below. >> ... >> type=2 >> count=2 >> status=1 >> sub_count=2 >> stripe_count=1 >> replica_count=2 >> disperse_count=0 >> redundancy_count=0 >> version=2 >> transport-type=0 >> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >> password=ef600dcd-42c5-48fc-8004-d13a3102616b >> op-version=3 >> client-op-version=3 >> quota-version=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> performance.readdir-ahead=on >> brick-0=128.224.162.255:-data-brick-gv0 >> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >> >> The file /var/lib/glusterd/vols/gv0/info on A node as below. >> >> wrsadmin at pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info >> type=2 >> count=2 >> status=1 >> sub_count=2 >> stripe_count=1 >> replica_count=2 >> disperse_count=0 >> redundancy_count=0 >> version=2 >> transport-type=0 >> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >> password=ef600dcd-42c5-48fc-8004-d13a3102616b >> op-version=3 >> client-op-version=3 >> quota-version=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> performance.readdir-ahead=on >> brick-0=128.224.162.255:-data-brick-gv0 >> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 > Contents look similar. But the log says different and that can' > t happen. Are you sure they are same? As a workaround can you delete the > same info file from the disk and restart glusterd instance and see > whether the problem persists? >> >> Thanks, >> Xin >> >> >> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj at redhat.com> wrote: >>> >>> >>>> On 02/17/2016 08:23 AM, songxin wrote: >>>> Hi, >>>> Thank you for your immediate and detailed reply.And I have a few more >>>> question about glusterfs. >>>> A node IP is 128.224.162.163. >>>> B node IP is 128.224.162.250. >>>> 1.After reboot B node and start the glusterd service the glusterd log is >>>> as blow. >>>> ... >>>> [2015-12-07 07:54:55.743966] I [MSGID: 101190] >>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>>> with index 2 >>>> [2015-12-07 07:54:55.744026] I [MSGID: 101190] >>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>>> with index 1 >>>> [2015-12-07 07:54:55.744280] I [MSGID: 106163] >>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] >>>> 0-management: using the op-version 30706 >>>> [2015-12-07 07:54:55.773606] I [MSGID: 106490] >>>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] >>>> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>>> [2015-12-07 07:54:55.777994] E [MSGID: 101076] >>>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not >>>> lookup hostname of 128.224.162.163 : Temporary failure in name resolution >>>> [2015-12-07 07:54:55.778290] E [MSGID: 106010] >>>> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management: >>>> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum >>>> 4087388312 on peer 128.224.162.163 >>> The above log entry is the reason of the rejection of the peer, most >>> probably its due to the compatibility issue. I believe the gluster >>> versions are different (share gluster versions from both the nodes) in >>> two nodes and you might have hit a bug. >>> >>> Can you share the delta of /var/lib/glusterd/vols/gv0/info file from >>> both the nodes? >>> >>> >>> ~Atin >>>> [2015-12-07 07:54:55.778384] I [MSGID: 106493] >>>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: >>>> Responded to 128.224.162.163 (0), ret: 0 >>>> [2015-12-07 07:54:55.928774] I [MSGID: 106493] >>>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received >>>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host: >>>> 128.224.162.163, port: 0 >>>> ... >>>> When I run gluster peer status on B node it show as below. >>>> Number of Peers: 1 >>>> >>>> Hostname: 128.224.162.163 >>>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>>> State: Peer Rejected (Connected) >>>> >>>> When I run "gluster volume status" on A node it show as below. >>>> >>>> Status of volume: gv0 >>>> Gluster process TCP Port RDMA Port Online Pid >>>> ------------------------------------------------------------------------------ >>>> Brick 128.224.162.163:/home/wrsadmin/work/t >>>> mp/data/brick/gv0 49152 0 Y >>>> 13019 >>>> NFS Server on localhost N/A N/A N >>>> N/A >>>> Self-heal Daemon on localhost N/A N/A Y >>>> 13045 >>>> >>>> Task Status of Volume gv0 >>>> ------------------------------------------------------------------------------ >>>> There are no active volume tasks >>>> >>>> It looks like the glusterfsd service is ok on A node. >>>> >>>> If because the peer state is Rejected so gluterd didn't start the >>>> glusterfsd?What causes this problem?? >>>> >>>> >>>> 2. Is glustershd(self-heal-daemon) the process as below? >>>> root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00 >>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>>> /var/lib/glusterd/glustershd/run/gluster .. >>>> >>>> If it is?? I want to know if the glustershd is also the bin glusterfsd?? >>>> just like glusterd and glusterfs. >>>> >>>> Thanks, >>>> Xin >>>> >>>> >>>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur at redhat.com> wrote: >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "songxin" <songxin_1980 at 126.com> >>>>>> To: gluster-users at gluster.org >>>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM >>>>>> Subject: [Gluster-users] question about sync replicate volume after rebooting one node >>>>>> >>>>>> Hi, >>>>>> I have a question about how to sync volume between two bricks after one node >>>>>> is reboot. >>>>>> >>>>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip >>>>>> is 128.124.10.2. >>>>>> >>>>>> operation steps on A node as below >>>>>> 1. gluster peer probe 128.124.10.2 >>>>>> 2. mkdir -p /data/brick/gv0 >>>>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0 >>>>>> 128.124.10.2 :/data/brick/gv1 force >>>>>> 4. gluster volume start gv0 >>>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>>> >>>>>> operation steps on B node as below >>>>>> 1 . mkdir -p /data/brick/gv0 >>>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>>> >>>>>> After all steps above , there a some gluster service process, including >>>>>> glusterd, glusterfs and glusterfsd, running on both A and B node. >>>>>> I can see these servic by command ps aux | grep gluster and command gluster >>>>>> volume status. >>>>>> >>>>>> Now reboot the B node.After B reboot , there are no gluster service running >>>>>> on B node. >>>>>> After I systemctl start glusterd , there is just glusterd service but not >>>>>> glusterfs and glusterfsd on B node. >>>>>> Because glusterfs and glusterfsd are not running so I can't gluster volume >>>>>> heal gv0 full. >>>>>> >>>>>> I want to know why glusterd don't start glusterfs and glusterfsd. >>>>> >>>>> On starting glusterd, glusterfsd should have started by itself. >>>>> Could you share glusterd and brick log (on node B) so that we know why glusterfsd >>>>> didn't start? >>>>> >>>>> Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force" >>>>> on one of the nodes and check if all the brick processes started. >>>>> >>>>> gluster volume status <VOLNAME> should be able to provide you with gluster process status. >>>>> >>>>> On restarting the node, glusterfs process for mount won't start by itself. You will have to run >>>>> step 2 on node B again for it. >>>>> >>>>>> How do I restart these services on B node? >>>>>> How do I sync the replicate volume after one node reboot? >>>>> >>>>> Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume >>>>> should start healing/syncing files that need to be synced. This deamon does periodic syncing of files. >>>>> >>>>> If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers. >>>>>> >>>>>> Thanks, >>>>>> Xin >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> -- >>>>> Thanks, >>>>> Anuradha. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://www.gluster.org/mailman/listinfo/gluster-users >> >> >> >>
Atin Mukherjee
2016-Feb-17 15:47 UTC
[Gluster-users] question about sync replicate volume after rebooting one node
On 02/17/2016 06:14 PM, songxin wrote:> Do you mean that I will delete the info file on B node and then start the glusterd?Or copy it from A node to B node?Any one of them and then a restart of GlusterD on B.> > ???? iPhone > >> ? 2016?2?17??14:59?Atin Mukherjee <amukherj at redhat.com> ??? >> >> >> >>> On 02/17/2016 11:44 AM, songxin wrote: >>> Hi?? >>> The version of glusterfs on A node and B node are both 3.7.6. >>> The time on B node is same after rebooting because B node hasn't RTC. >>> Does it cause the problem? >>> >>> If I run " gluster volume start gv0 force " the glusterfsd can be >>> started but "gluster volume start gv0" don't work. >>> >>> The file /var/lib/glusterd/vols/gv0/info on B node as below. >>> ... >>> type=2 >>> count=2 >>> status=1 >>> sub_count=2 >>> stripe_count=1 >>> replica_count=2 >>> disperse_count=0 >>> redundancy_count=0 >>> version=2 >>> transport-type=0 >>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >>> password=ef600dcd-42c5-48fc-8004-d13a3102616b >>> op-version=3 >>> client-op-version=3 >>> quota-version=0 >>> parent_volname=N/A >>> restored_from_snap=00000000-0000-0000-0000-000000000000 >>> snap-max-hard-limit=256 >>> performance.readdir-ahead=on >>> brick-0=128.224.162.255:-data-brick-gv0 >>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >>> >>> The file /var/lib/glusterd/vols/gv0/info on A node as below. >>> >>> wrsadmin at pek-song1-d1:~/work/tmp$ sudo cat /var/lib/glusterd/vols/gv0/info >>> type=2 >>> count=2 >>> status=1 >>> sub_count=2 >>> stripe_count=1 >>> replica_count=2 >>> disperse_count=0 >>> redundancy_count=0 >>> version=2 >>> transport-type=0 >>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27 >>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94 >>> password=ef600dcd-42c5-48fc-8004-d13a3102616b >>> op-version=3 >>> client-op-version=3 >>> quota-version=0 >>> parent_volname=N/A >>> restored_from_snap=00000000-0000-0000-0000-000000000000 >>> snap-max-hard-limit=256 >>> performance.readdir-ahead=on >>> brick-0=128.224.162.255:-data-brick-gv0 >>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0 >> Contents look similar. But the log says different and that can' >> t happen. Are you sure they are same? As a workaround can you delete the >> same info file from the disk and restart glusterd instance and see >> whether the problem persists? >>> >>> Thanks, >>> Xin >>> >>> >>> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj at redhat.com> wrote: >>>> >>>> >>>>> On 02/17/2016 08:23 AM, songxin wrote: >>>>> Hi, >>>>> Thank you for your immediate and detailed reply.And I have a few more >>>>> question about glusterfs. >>>>> A node IP is 128.224.162.163. >>>>> B node IP is 128.224.162.250. >>>>> 1.After reboot B node and start the glusterd service the glusterd log is >>>>> as blow. >>>>> ... >>>>> [2015-12-07 07:54:55.743966] I [MSGID: 101190] >>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>> with index 2 >>>>> [2015-12-07 07:54:55.744026] I [MSGID: 101190] >>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >>>>> with index 1 >>>>> [2015-12-07 07:54:55.744280] I [MSGID: 106163] >>>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] >>>>> 0-management: using the op-version 30706 >>>>> [2015-12-07 07:54:55.773606] I [MSGID: 106490] >>>>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req] >>>>> 0-glusterd: Received probe from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>>>> [2015-12-07 07:54:55.777994] E [MSGID: 101076] >>>>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could not >>>>> lookup hostname of 128.224.162.163 : Temporary failure in name resolution >>>>> [2015-12-07 07:54:55.778290] E [MSGID: 106010] >>>>> [glusterd-utils.c:2717:glusterd_compare_friend_volume] 0-management: >>>>> Version of Cksums gv0 differ. local cksum = 2492237955, remote cksum >>>>> 4087388312 on peer 128.224.162.163 >>>> The above log entry is the reason of the rejection of the peer, most >>>> probably its due to the compatibility issue. I believe the gluster >>>> versions are different (share gluster versions from both the nodes) in >>>> two nodes and you might have hit a bug. >>>> >>>> Can you share the delta of /var/lib/glusterd/vols/gv0/info file from >>>> both the nodes? >>>> >>>> >>>> ~Atin >>>>> [2015-12-07 07:54:55.778384] I [MSGID: 106493] >>>>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd: >>>>> Responded to 128.224.162.163 (0), ret: 0 >>>>> [2015-12-07 07:54:55.928774] I [MSGID: 106493] >>>>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received >>>>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host: >>>>> 128.224.162.163, port: 0 >>>>> ... >>>>> When I run gluster peer status on B node it show as below. >>>>> Number of Peers: 1 >>>>> >>>>> Hostname: 128.224.162.163 >>>>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44 >>>>> State: Peer Rejected (Connected) >>>>> >>>>> When I run "gluster volume status" on A node it show as below. >>>>> >>>>> Status of volume: gv0 >>>>> Gluster process TCP Port RDMA Port Online Pid >>>>> ------------------------------------------------------------------------------ >>>>> Brick 128.224.162.163:/home/wrsadmin/work/t >>>>> mp/data/brick/gv0 49152 0 Y >>>>> 13019 >>>>> NFS Server on localhost N/A N/A N >>>>> N/A >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 13045 >>>>> >>>>> Task Status of Volume gv0 >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> It looks like the glusterfsd service is ok on A node. >>>>> >>>>> If because the peer state is Rejected so gluterd didn't start the >>>>> glusterfsd?What causes this problem?? >>>>> >>>>> >>>>> 2. Is glustershd(self-heal-daemon) the process as below? >>>>> root 497 0.8 0.0 432520 18104 ? Ssl 08:07 0:00 >>>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>>>> /var/lib/glusterd/glustershd/run/gluster .. >>>>> >>>>> If it is?? I want to know if the glustershd is also the bin glusterfsd?? >>>>> just like glusterd and glusterfs. >>>>> >>>>> Thanks, >>>>> Xin >>>>> >>>>> >>>>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur at redhat.com> wrote: >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "songxin" <songxin_1980 at 126.com> >>>>>>> To: gluster-users at gluster.org >>>>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM >>>>>>> Subject: [Gluster-users] question about sync replicate volume after rebooting one node >>>>>>> >>>>>>> Hi, >>>>>>> I have a question about how to sync volume between two bricks after one node >>>>>>> is reboot. >>>>>>> >>>>>>> There are two node, A node and B node.A node ip is 128.124.10.1 and B node ip >>>>>>> is 128.124.10.2. >>>>>>> >>>>>>> operation steps on A node as below >>>>>>> 1. gluster peer probe 128.124.10.2 >>>>>>> 2. mkdir -p /data/brick/gv0 >>>>>>> 3.gluster volume create gv0 replica 2 128.124.10.1 :/data/brick/gv0 >>>>>>> 128.124.10.2 :/data/brick/gv1 force >>>>>>> 4. gluster volume start gv0 >>>>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>>>> >>>>>>> operation steps on B node as below >>>>>>> 1 . mkdir -p /data/brick/gv0 >>>>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster >>>>>>> >>>>>>> After all steps above , there a some gluster service process, including >>>>>>> glusterd, glusterfs and glusterfsd, running on both A and B node. >>>>>>> I can see these servic by command ps aux | grep gluster and command gluster >>>>>>> volume status. >>>>>>> >>>>>>> Now reboot the B node.After B reboot , there are no gluster service running >>>>>>> on B node. >>>>>>> After I systemctl start glusterd , there is just glusterd service but not >>>>>>> glusterfs and glusterfsd on B node. >>>>>>> Because glusterfs and glusterfsd are not running so I can't gluster volume >>>>>>> heal gv0 full. >>>>>>> >>>>>>> I want to know why glusterd don't start glusterfs and glusterfsd. >>>>>> >>>>>> On starting glusterd, glusterfsd should have started by itself. >>>>>> Could you share glusterd and brick log (on node B) so that we know why glusterfsd >>>>>> didn't start? >>>>>> >>>>>> Do you still see glusterfsd service running on node A? You can try running "gluster v start <VOLNAME> force" >>>>>> on one of the nodes and check if all the brick processes started. >>>>>> >>>>>> gluster volume status <VOLNAME> should be able to provide you with gluster process status. >>>>>> >>>>>> On restarting the node, glusterfs process for mount won't start by itself. You will have to run >>>>>> step 2 on node B again for it. >>>>>> >>>>>>> How do I restart these services on B node? >>>>>>> How do I sync the replicate volume after one node reboot? >>>>>> >>>>>> Once the glusterfsd process starts on node B too, glustershd -- self-heal-daemon -- for replicate volume >>>>>> should start healing/syncing files that need to be synced. This deamon does periodic syncing of files. >>>>>> >>>>>> If you want to trigger a heal explicitly, you can run gluster volume heal <VOLNAME> on one of the servers. >>>>>>> >>>>>>> Thanks, >>>>>>> Xin >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Anuradha. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>>