thr3ads.net - Gluster users - [Gluster-users] question about sync replicate volume after rebooting one node [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2016-Feb-17 06:59 UTC

[Gluster-users] question about sync replicate volume after rebooting one node

On 02/17/2016 11:44 AM, songxin wrote:> Hi?
> The version of glusterfs on  A node and B node are both 3.7.6.
> The time on B node is same after rebooting because B node hasn't RTC.
> Does it cause the problem?
> 
> If I run " gluster volume start gv0 force " the glusterfsd can be
> started but "gluster volume start gv0" don't work.
> 
> The file /var/lib/glusterd/vols/gv0/info on B node as below.
> ...
> type=2
> count=2
> status=1
> sub_count=2
> stripe_count=1
> replica_count=2
> disperse_count=0
> redundancy_count=0
> version=2
> transport-type=0
> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
> password=ef600dcd-42c5-48fc-8004-d13a3102616b
> op-version=3
> client-op-version=3
> quota-version=0
> parent_volname=N/A
> restored_from_snap=00000000-0000-0000-0000-000000000000
> snap-max-hard-limit=256
> performance.readdir-ahead=on
> brick-0=128.224.162.255:-data-brick-gv0
> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
> 
> The file /var/lib/glusterd/vols/gv0/info on A node as below.
> 
> wrsadmin at pek-song1-d1:~/work/tmp$ sudo cat
/var/lib/glusterd/vols/gv0/info
> type=2
> count=2
> status=1
> sub_count=2
> stripe_count=1
> replica_count=2
> disperse_count=0
> redundancy_count=0
> version=2
> transport-type=0
> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
> password=ef600dcd-42c5-48fc-8004-d13a3102616b
> op-version=3
> client-op-version=3
> quota-version=0
> parent_volname=N/A
> restored_from_snap=00000000-0000-0000-0000-000000000000
> snap-max-hard-limit=256
> performance.readdir-ahead=on
> brick-0=128.224.162.255:-data-brick-gv0
> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0Contents look similar. But the log says different and that can'
t happen. Are you sure they are same? As a workaround can you delete the
same info file from the disk and restart glusterd instance and see
whether the problem persists?> 
> Thanks,
> Xin
> 
> 
> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj at
redhat.com> wrote:
>>
>>
>>On 02/17/2016 08:23 AM, songxin wrote:
>>> Hi,
>>> Thank you for your immediate and detailed reply.And I have a few
more
>>> question about glusterfs. 
>>> A node IP is 128.224.162.163.
>>> B node IP is 128.224.162.250.
>>> 1.After reboot B node and start the glusterd service the glusterd
log is
>>> as blow.
>>> ...
>>> [2015-12-07 07:54:55.743966] I [MSGID: 101190]
>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 2
>>> [2015-12-07 07:54:55.744026] I [MSGID: 101190]
>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
>>> with index 1
>>> [2015-12-07 07:54:55.744280] I [MSGID: 106163]
>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>> 0-management: using the op-version 30706
>>> [2015-12-07 07:54:55.773606] I [MSGID: 106490]
>>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
>>> 0-glusterd: Received probe from uuid:
b6efd8fc-5eab-49d4-a537-2750de644a44
>>> [2015-12-07 07:54:55.777994] E [MSGID: 101076]
>>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils: Could
not
>>> lookup hostname of 128.224.162.163 : Temporary failure in name
resolution
>>> [2015-12-07 07:54:55.778290] E [MSGID: 106010]
>>> [glusterd-utils.c:2717:glusterd_compare_friend_volume]
0-management:
>>> Version of Cksums gv0 differ. local cksum = 2492237955, remote
cksum >>> 4087388312 on peer 128.224.162.163
>>The above log entry is the reason of the rejection of the peer, most
>>probably its due to the compatibility issue. I believe the gluster
>>versions are different (share gluster versions from both the nodes) in
>>two nodes and you might have hit a bug.
>>
>>Can you share the delta of /var/lib/glusterd/vols/gv0/info file from
>>both the nodes?
>>
>>
>>~Atin
>>> [2015-12-07 07:54:55.778384] I [MSGID: 106493]
>>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp] 0-glusterd:
>>> Responded to 128.224.162.163 (0), ret: 0
>>> [2015-12-07 07:54:55.928774] I [MSGID: 106493]
>>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd:
Received
>>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host:
>>> 128.224.162.163, port: 0
>>> ...
>>> When I run gluster peer status on B node it show as below.
>>> Number of Peers: 1
>>> 
>>> Hostname: 128.224.162.163
>>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>>> State: Peer Rejected (Connected)
>>> 
>>> When I run "gluster volume status" on A node  it show as
below.
>>>  
>>> Status of volume: gv0
>>> Gluster process                             TCP Port  RDMA Port 
Online  Pid
>>>
------------------------------------------------------------------------------
>>> Brick 128.224.162.163:/home/wrsadmin/work/t
>>> mp/data/brick/gv0                           49152     0          Y
>>> 13019
>>> NFS Server on localhost                     N/A       N/A        N
>>> N/A  
>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>> 13045
>>>  
>>> Task Status of Volume gv0
>>>
------------------------------------------------------------------------------
>>> There are no active volume tasks
>>> 
>>> It looks like the glusterfsd service is ok on A node.
>>> 
>>> If because the peer state is Rejected so gluterd didn't start
the
>>> glusterfsd?What causes this problem?
>>> 
>>> 
>>> 2. Is glustershd(self-heal-daemon) the process as below?
>>> root       497  0.8  0.0 432520 18104 ?        Ssl  08:07   0:00
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/lib/glusterd/glustershd/run/gluster ..
>>> 
>>> If it is? I want to know if the glustershd is also the bin
glusterfsd?
>>> just like glusterd and glusterfs.
>>> 
>>> Thanks,
>>> Xin
>>> 
>>> 
>>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur at
redhat.com> wrote:
>>>>
>>>>
>>>>----- Original Message -----
>>>>> From: "songxin" <songxin_1980 at 126.com>
>>>>> To: gluster-users at gluster.org
>>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM
>>>>> Subject: [Gluster-users] question about sync replicate
volume after	rebooting one node
>>>>> 
>>>>> Hi,
>>>>> I have a question about how to sync volume between two
bricks after one node
>>>>> is reboot.
>>>>> 
>>>>> There are two node, A node and B node.A node ip is
128.124.10.1 and B node ip
>>>>> is 128.124.10.2.
>>>>> 
>>>>> operation steps on A node as below
>>>>> 1. gluster peer probe 128.124.10.2
>>>>> 2. mkdir -p /data/brick/gv0
>>>>> 3.gluster volume create gv0 replica 2 128.124.10.1
:/data/brick/gv0
>>>>> 128.124.10.2 :/data/brick/gv1 force
>>>>> 4. gluster volume start gv0
>>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>>>> 
>>>>> operation steps on B node as below
>>>>> 1 . mkdir -p /data/brick/gv0
>>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>>>> 
>>>>> After all steps above , there a some gluster service
process, including
>>>>> glusterd, glusterfs and glusterfsd, running on both A and B
node.
>>>>> I can see these servic by command ps aux | grep gluster and
command gluster
>>>>> volume status.
>>>>> 
>>>>> Now reboot the B node.After B reboot , there are no gluster
service running
>>>>> on B node.
>>>>> After I systemctl start glusterd , there is just glusterd
service but not
>>>>> glusterfs and glusterfsd on B node.
>>>>> Because glusterfs and glusterfsd are not running so I
can't gluster volume
>>>>> heal gv0 full.
>>>>> 
>>>>> I want to know why glusterd don't start glusterfs and
glusterfsd.
>>>>
>>>>On starting glusterd, glusterfsd should have started by itself.
>>>>Could you share glusterd and brick log (on node B) so that we
know why glusterfsd
>>>>didn't start?
>>>>
>>>>Do you still see glusterfsd service running on node A? You can
try running "gluster v start <VOLNAME> force"
>>>>on one of the nodes and check if all the brick processes
started.
>>>>
>>>>gluster volume status <VOLNAME> should be able to provide
you with gluster process status.
>>>>
>>>>On restarting the node, glusterfs process for mount won't
start by itself. You will have to run
>>>>step 2 on node B again for it.
>>>>
>>>>> How do I restart these services on B node?
>>>>> How do I sync the replicate volume after one node reboot?
>>>>
>>>>Once the glusterfsd process starts on node B too, glustershd --
self-heal-daemon -- for replicate volume
>>>>should start healing/syncing files that need to be synced. This
deamon does periodic syncing of files.
>>>>
>>>>If you want to trigger a heal explicitly, you can run gluster
volume heal <VOLNAME> on one of the servers.
>>>>> 
>>>>> Thanks,
>>>>> Xin
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>-- 
>>>>Thanks,
>>>>Anuradha.
>>> 
>>> 
>>> 
>>>  
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>> 
> 
> 
> 
>  
>

songxin

2016-Feb-17 12:44 UTC

head link

[Gluster-users] question about sync replicate volume after rebooting one node

Do you mean that I will delete the info file on B node and then start the
glusterd?Or copy it from A node to B node?

???? iPhone
> ? 2016?2?17??14:59?Atin Mukherjee <amukherj at redhat.com> ???
> 
> 
> 
>> On 02/17/2016 11:44 AM, songxin wrote:
>> Hi??
>> The version of glusterfs on  A node and B node are both 3.7.6.
>> The time on B node is same after rebooting because B node hasn't
RTC.
>> Does it cause the problem?
>> 
>> If I run " gluster volume start gv0 force " the glusterfsd
can be
>> started but "gluster volume start gv0" don't work.
>> 
>> The file /var/lib/glusterd/vols/gv0/info on B node as below.
>> ...
>> type=2
>> count=2
>> status=1
>> sub_count=2
>> stripe_count=1
>> replica_count=2
>> disperse_count=0
>> redundancy_count=0
>> version=2
>> transport-type=0
>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>> op-version=3
>> client-op-version=3
>> quota-version=0
>> parent_volname=N/A
>> restored_from_snap=00000000-0000-0000-0000-000000000000
>> snap-max-hard-limit=256
>> performance.readdir-ahead=on
>> brick-0=128.224.162.255:-data-brick-gv0
>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
>> 
>> The file /var/lib/glusterd/vols/gv0/info on A node as below.
>> 
>> wrsadmin at pek-song1-d1:~/work/tmp$ sudo cat
/var/lib/glusterd/vols/gv0/info
>> type=2
>> count=2
>> status=1
>> sub_count=2
>> stripe_count=1
>> replica_count=2
>> disperse_count=0
>> redundancy_count=0
>> version=2
>> transport-type=0
>> volume-id=c4197371-6d01-4477-8cb2-384cda569c27
>> username=62e009ea-47c4-46b4-8e74-47cd9c199d94
>> password=ef600dcd-42c5-48fc-8004-d13a3102616b
>> op-version=3
>> client-op-version=3
>> quota-version=0
>> parent_volname=N/A
>> restored_from_snap=00000000-0000-0000-0000-000000000000
>> snap-max-hard-limit=256
>> performance.readdir-ahead=on
>> brick-0=128.224.162.255:-data-brick-gv0
>> brick-1=128.224.162.163:-home-wrsadmin-work-tmp-data-brick-gv0
> Contents look similar. But the log says different and that can'
> t happen. Are you sure they are same? As a workaround can you delete the
> same info file from the disk and restart glusterd instance and see
> whether the problem persists?
>> 
>> Thanks,
>> Xin
>> 
>> 
>> At 2016-02-17 12:01:37, "Atin Mukherjee" <amukherj at
redhat.com> wrote:
>>> 
>>> 
>>>> On 02/17/2016 08:23 AM, songxin wrote:
>>>> Hi,
>>>> Thank you for your immediate and detailed reply.And I have a
few more
>>>> question about glusterfs. 
>>>> A node IP is 128.224.162.163.
>>>> B node IP is 128.224.162.250.
>>>> 1.After reboot B node and start the glusterd service the
glusterd log is
>>>> as blow.
>>>> ...
>>>> [2015-12-07 07:54:55.743966] I [MSGID: 101190]
>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 2
>>>> [2015-12-07 07:54:55.744026] I [MSGID: 101190]
>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
Started thread
>>>> with index 1
>>>> [2015-12-07 07:54:55.744280] I [MSGID: 106163]
>>>> [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
>>>> 0-management: using the op-version 30706
>>>> [2015-12-07 07:54:55.773606] I [MSGID: 106490]
>>>> [glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
>>>> 0-glusterd: Received probe from uuid:
b6efd8fc-5eab-49d4-a537-2750de644a44
>>>> [2015-12-07 07:54:55.777994] E [MSGID: 101076]
>>>> [common-utils.c:2954:gf_get_hostname_from_ip] 0-common-utils:
Could not
>>>> lookup hostname of 128.224.162.163 : Temporary failure in name
resolution
>>>> [2015-12-07 07:54:55.778290] E [MSGID: 106010]
>>>> [glusterd-utils.c:2717:glusterd_compare_friend_volume]
0-management:
>>>> Version of Cksums gv0 differ. local cksum = 2492237955, remote
cksum >>>> 4087388312 on peer 128.224.162.163
>>> The above log entry is the reason of the rejection of the peer,
most
>>> probably its due to the compatibility issue. I believe the gluster
>>> versions are different (share gluster versions from both the nodes)
in
>>> two nodes and you might have hit a bug.
>>> 
>>> Can you share the delta of /var/lib/glusterd/vols/gv0/info file
from
>>> both the nodes?
>>> 
>>> 
>>> ~Atin
>>>> [2015-12-07 07:54:55.778384] I [MSGID: 106493]
>>>> [glusterd-handler.c:3780:glusterd_xfer_friend_add_resp]
0-glusterd:
>>>> Responded to 128.224.162.163 (0), ret: 0
>>>> [2015-12-07 07:54:55.928774] I [MSGID: 106493]
>>>> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd:
Received
>>>> RJT from uuid: b6efd8fc-5eab-49d4-a537-2750de644a44, host:
>>>> 128.224.162.163, port: 0
>>>> ...
>>>> When I run gluster peer status on B node it show as below.
>>>> Number of Peers: 1
>>>> 
>>>> Hostname: 128.224.162.163
>>>> Uuid: b6efd8fc-5eab-49d4-a537-2750de644a44
>>>> State: Peer Rejected (Connected)
>>>> 
>>>> When I run "gluster volume status" on A node  it show
as below.
>>>> 
>>>> Status of volume: gv0
>>>> Gluster process                             TCP Port  RDMA Port
Online  Pid
>>>>
------------------------------------------------------------------------------
>>>> Brick 128.224.162.163:/home/wrsadmin/work/t
>>>> mp/data/brick/gv0                           49152     0        
Y
>>>> 13019
>>>> NFS Server on localhost                     N/A       N/A      
N
>>>> N/A  
>>>> Self-heal Daemon on localhost               N/A       N/A      
Y
>>>> 13045
>>>> 
>>>> Task Status of Volume gv0
>>>>
------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>> 
>>>> It looks like the glusterfsd service is ok on A node.
>>>> 
>>>> If because the peer state is Rejected so gluterd didn't
start the
>>>> glusterfsd?What causes this problem??
>>>> 
>>>> 
>>>> 2. Is glustershd(self-heal-daemon) the process as below?
>>>> root       497  0.8  0.0 432520 18104 ?        Ssl  08:07  
0:00
>>>> /usr/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd -p
>>>> /var/lib/glusterd/glustershd/run/gluster ..
>>>> 
>>>> If it is?? I want to know if the glustershd is also the bin
glusterfsd??
>>>> just like glusterd and glusterfs.
>>>> 
>>>> Thanks,
>>>> Xin
>>>> 
>>>> 
>>>> At 2016-02-16 18:53:03, "Anuradha Talur" <atalur
at redhat.com> wrote:
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "songxin" <songxin_1980 at
126.com>
>>>>>> To: gluster-users at gluster.org
>>>>>> Sent: Tuesday, February 16, 2016 3:59:50 PM
>>>>>> Subject: [Gluster-users] question about sync replicate
volume after    rebooting one node
>>>>>> 
>>>>>> Hi,
>>>>>> I have a question about how to sync volume between two
bricks after one node
>>>>>> is reboot.
>>>>>> 
>>>>>> There are two node, A node and B node.A node ip is
128.124.10.1 and B node ip
>>>>>> is 128.124.10.2.
>>>>>> 
>>>>>> operation steps on A node as below
>>>>>> 1. gluster peer probe 128.124.10.2
>>>>>> 2. mkdir -p /data/brick/gv0
>>>>>> 3.gluster volume create gv0 replica 2 128.124.10.1
:/data/brick/gv0
>>>>>> 128.124.10.2 :/data/brick/gv1 force
>>>>>> 4. gluster volume start gv0
>>>>>> 5.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>>>>> 
>>>>>> operation steps on B node as below
>>>>>> 1 . mkdir -p /data/brick/gv0
>>>>>> 2.mount -t glusterfs 128.124.10.1 :/gv0 gluster
>>>>>> 
>>>>>> After all steps above , there a some gluster service
process, including
>>>>>> glusterd, glusterfs and glusterfsd, running on both A
and B node.
>>>>>> I can see these servic by command ps aux | grep gluster
and command gluster
>>>>>> volume status.
>>>>>> 
>>>>>> Now reboot the B node.After B reboot , there are no
gluster service running
>>>>>> on B node.
>>>>>> After I systemctl start glusterd , there is just
glusterd service but not
>>>>>> glusterfs and glusterfsd on B node.
>>>>>> Because glusterfs and glusterfsd are not running so I
can't gluster volume
>>>>>> heal gv0 full.
>>>>>> 
>>>>>> I want to know why glusterd don't start glusterfs
and glusterfsd.
>>>>> 
>>>>> On starting glusterd, glusterfsd should have started by
itself.
>>>>> Could you share glusterd and brick log (on node B) so that
we know why glusterfsd
>>>>> didn't start?
>>>>> 
>>>>> Do you still see glusterfsd service running on node A? You
can try running "gluster v start <VOLNAME> force"
>>>>> on one of the nodes and check if all the brick processes
started.
>>>>> 
>>>>> gluster volume status <VOLNAME> should be able to
provide you with gluster process status.
>>>>> 
>>>>> On restarting the node, glusterfs process for mount
won't start by itself. You will have to run
>>>>> step 2 on node B again for it.
>>>>> 
>>>>>> How do I restart these services on B node?
>>>>>> How do I sync the replicate volume after one node
reboot?
>>>>> 
>>>>> Once the glusterfsd process starts on node B too,
glustershd -- self-heal-daemon -- for replicate volume
>>>>> should start healing/syncing files that need to be synced.
This deamon does periodic syncing of files.
>>>>> 
>>>>> If you want to trigger a heal explicitly, you can run
gluster volume heal <VOLNAME> on one of the servers.
>>>>>> 
>>>>>> Thanks,
>>>>>> Xin
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> 
>>>>> -- 
>>>>> Thanks,
>>>>> Anuradha.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> 
>> 
>> 
>>

Gluster users - Feb 2016 - question about sync replicate volume after rebooting one node

[Gluster-users] question about sync replicate volume after rebooting one node

[Gluster-users] question about sync replicate volume after rebooting one node