Kotresh Hiremath Ravishankar
2016-May-24 10:40 UTC
[Gluster-users] 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty
Ok, it looks like there is a problem with ssh key distribution. Before I suggest to clean those up and do setup again, could you share the output of following commands 1. gluster vol geo-rep <master_vol> <slave_host>::slave status 2. ls -l /var/lib/glusterd/geo-replication/ Is there multiple geo-rep sessions from this master volume or only one? Thanks and Regards, Kotresh H R ----- Original Message -----> From: "vyyy???" <yuyangyang at ctrip.com> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>, Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna > Murthy" <avishwan at redhat.com> > Sent: Tuesday, May 24, 2016 3:19:55 PM > Subject: ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial faulty > > We can establish passwordless ssh directly with command 'ssh' , but when > create push-pem, it shows ' Passwordless ssh login has not been setup ' > unless copy secret.pem to *id_rsa.pub > > [root at SVR8048HW2285 ~]# ssh -i /var/lib/glusterd/geo-replication/secret.pem > root at glusterfs01.sh3.ctripcorp.com > Last login: Tue May 24 17:23:53 2016 from 10.8.230.213 > This is a private network server, in monitoring state. > It is strictly prohibited to unauthorized access and used. > [root at SVR6519HW2285 ~]# > > > [root at SVR8048HW2285 filews]# gluster volume geo-replication filews > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > Passwordless ssh login has not been setup with glusterfs01.sh3.ctripcorp.com > for user root. > geo-replication command failed > [root at SVR8048HW2285 filews]# > > > > Best Regards > ??? Yuyang Yang > > > -----????----- > ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] > ????: Tuesday, May 24, 2016 3:22 PM > ???: vyyy??? <yuyangyang at Ctrip.com> > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; Gluster-users at gluster.org; > Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com> > ??: Re: ??: ??: ??: [Gluster-users] ??: geo-replication status partial faulty > > Hi > > Could you try following command from corresponding masters to faulty slave > nodes and share the output? > The below command should not ask for password and should run gsync.py. > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty hosts> > > To establish passwordless ssh, it is not necessary to copy secret.pem to > *id_rsa.pub. > > If the geo-rep session is already established, passwordless ssh would already > be there. > My suspect is that when I asked you to do 'create force' you did it using > another slave where password less ssh was not setup. This would create > another session directory in '/var/lib/glusterd/geo-replication' i.e > (<master_vol>_<slave_host>_<slave_vol>) > > Please check and let us know. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "vyyy???" <yuyangyang at ctrip.com> > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>, > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy" > > <avishwan at redhat.com> > > Sent: Friday, May 20, 2016 12:35:58 PM > > Subject: ??: ??: ??: [Gluster-users] ??: geo-replication status > > partial faulty > > > > Hello, Kotresh > > > > I 'create force', but still some nodes work ,some nodes faulty. > > > > On faulty nodes > > etc-glusterfs-glusterd.vol.log shown: > > [2016-05-20 06:27:03.260870] I > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed > > config > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > [2016-05-20 06:27:03.404544] E > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to > > read gsyncd status file > > [2016-05-20 06:27:03.404583] E > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read > > the statusfile for /export/sdb/brick1 brick for filews(master), > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > > > > > > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65. > > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log > > shown: > > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor: > > ------------------------------------------------------------ > > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor: > > starting gsyncd worker > > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>: rpc_fd: > > '7,11,10,9' > > [2016-05-20 15:04:01.987505] I [changelogagent(agent):72:__init__] > > ChangelogAgent: Agent listining... > > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop] RepceServer: > > terminating on reaching EOF. > > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize] <top>: > > exiting. > > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor: > > worker(/export/sdb/brick1) died before establishing connection > > > > Can you help me! > > > > > > Best Regards > > ??? Yuyang Yang > > > > > > > > -----????----- > > ???: vyyy??? > > ????: Thursday, May 19, 2016 7:45 PM > > ???: 'Kotresh Hiremath Ravishankar' <khiremat at redhat.com> > > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy > > <avishwan at redhat.com> > > ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial > > faulty > > > > Still not work. > > > > I need copy /var/lib/glusterd/geo-replication/secret.* to > > /root/.ssh/id_rsa to make passwordless ssh work. > > > > I generate /var/lib/glusterd/geo-replication/secret.pem file on > > every master nodes. > > > > I am not sure is this right. > > > > > > [root at sh02svr5956 ~]# gluster volume geo-replication filews > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > > Passwordless ssh login has not been setup with > > glusterfs01.sh3.ctripcorp.com for user root. > > geo-replication command failed > > > > [root at sh02svr5956 .ssh]# cp > > /var/lib/glusterd/geo-replication/secret.pem > > ./id_rsa > > cp: overwrite `./id_rsa'? y > > [root at sh02svr5956 .ssh]# cp > > /var/lib/glusterd/geo-replication/secret.pem.pub > > ./id_rsa.pub > > cp: overwrite `./id_rsa.pub'? > > > > [root at sh02svr5956 ~]# gluster volume geo-replication filews > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > > Creating geo-replication session between filews & > > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful > > [root at sh02svr5956 ~]# > > > > > > > > > > Best Regards > > ??? Yuyang Yang > > OPS > > Ctrip Infrastructure Service (CIS) > > Ctrip Computer Technology (Shanghai) Co., Ltd > > Phone:?+ 86 21 34064880-15554?| Fax: + 86 21 52514588-13389 > > Web:?www.Ctrip.com > > > > > > -----????----- > > ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] > > ????: Thursday, May 19, 2016 5:07 PM > > ???: vyyy??? <yuyangyang at Ctrip.com> > > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy > > <avishwan at redhat.com> > > ??: Re: ??: ??: [Gluster-users] ??: geo-replication status partial > > faulty > > > > Hi, > > > > Could you just try 'create force' once to fix those status file errors? > > > > e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave vol> > > create push-pem force > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "vyyy???" <yuyangyang at ctrip.com> > > > To: "Saravanakumar Arumugam" <sarumuga at redhat.com>, > > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy" > > > <avishwan at redhat.com>, "Kotresh Hiremath Ravishankar" > > > <khiremat at redhat.com> > > > Sent: Thursday, May 19, 2016 2:15:34 PM > > > Subject: ??: ??: [Gluster-users] ??: geo-replication status partial > > > faulty > > > > > > I have checked all the nodes both on masters and slaves, the > > > software is the same. > > > > > > I am puzzled why there were half masters work, halt faulty. > > > > > > > > > [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster > > > glusterfs-api-3.6.3-1.el6.x86_64 > > > glusterfs-fuse-3.6.3-1.el6.x86_64 > > > glusterfs-geo-replication-3.6.3-1.el6.x86_64 > > > glusterfs-3.6.3-1.el6.x86_64 > > > glusterfs-cli-3.6.3-1.el6.x86_64 > > > glusterfs-server-3.6.3-1.el6.x86_64 > > > glusterfs-libs-3.6.3-1.el6.x86_64 > > > > > > > > > > > > > > > Best Regards > > > ??? Yuyang Yang > > > > > > OPS > > > Ctrip Infrastructure Service (CIS) > > > Ctrip Computer Technology (Shanghai) Co., Ltd > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > > > Web: www.Ctrip.com<http://www.ctrip.com/> > > > > > > > > > > > > ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com] > > > ????: Thursday, May 19, 2016 4:33 PM > > > ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at gluster.org; > > > Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com>; > > > Kotresh Hiremath Ravishankar <khiremat at redhat.com> > > > ??: Re: ??: [Gluster-users] ??: geo-replication status partial > > > faulty > > > > > > Hi, > > > +geo-rep team. > > > > > > Can you get the gluster version you are using? > > > > > > # For example: > > > rpm -qa | grep gluster > > > > > > I hope you have same gluster version installed everywhere. > > > Please double check and share the same. > > > > > > Thanks, > > > Saravana > > > On 05/19/2016 01:37 PM, vyyy??? wrote: > > > Hi, Saravana > > > > > > I have changed log level to DEBUG. Then start geo-replication with > > > log-file option, attached the file. > > > > > > gluster volume geo-replication filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log > > > > > > I have checked /root/.ssh/authorized_keys in > > > glusterfs01.sh3.ctripcorp.com , It have entries in > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub. > > > and I have removed the lines not started with ?command=? > > > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@ > > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no ssh > > > error. > > > > > > > > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows : > > > > > > [2016-05-19 06:39:23.405974] I > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed > > > config > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > [2016-05-19 06:39:23.541169] E > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable > > > to read gsyncd status file > > > [2016-05-19 06:39:23.541210] E > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to > > > read the statusfile for /export/sdb/filews brick for > > > filews(master), > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > > > [2016-05-19 06:39:29.472047] I > > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using > > > passed config > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > [2016-05-19 06:39:34.939709] I > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed > > > config > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > [2016-05-19 06:39:35.058520] E > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable > > > to read gsyncd status file > > > > > > > > > /var/log/glusterfs/geo-replication/filews/ > > > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_ > > > sl > > > ave.log > > > shows as following: > > > > > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor: > > > ------------------------------------------------------------ > > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor: > > > starting gsyncd worker > > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd: > > > '7,11,10,9' > > > [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__] > > > ChangelogAgent: Agent listining... > > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor: > > > worker(/export/sdb/filews) died before establishing connection > > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] > > > RepceServer: > > > terminating on reaching EOF. > > > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>: > > > exiting. > > > > > > > > > > > > > > > > > > > > > Best Regards > > > Yuyang Yang > > > > > > > > > > > > > > > > > > ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]and share what's > > > the output? > > > ????: Thursday, May 19, 2016 1:59 PM > > > ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang at Ctrip.com>; > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> > > > ??: Re: [Gluster-users] ??: geo-replication status partial faulty > > > > > > Hi, > > > > > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node. > > > Can you share the complete logs ? > > > > > > You can increase verbosity of debug messages like this: > > > gluster volume geo-replication <master volume> <slave host>::<slave > > > volume> config log-level DEBUG > > > > > > > > > Also, check /root/.ssh/authorized_keys in > > > glusterfs01.sh3.ctripcorp.com It should have entries in > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present in > > > master node). > > > > > > Have a look at this one for example: > > > https://www.gluster.org/pipermail/gluster-users/2015-August/023174.h > > > tm > > > l > > > > > > Thanks, > > > Saravana > > > On 05/19/2016 07:53 AM, vyyy??? wrote: > > > Hello, > > > > > > I have tried to config a geo-replication volume , all the master > > > nodes configuration are the same, When I start this volume, the > > > status shows partial faulty as following: > > > > > > gluster volume geo-replication filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave status > > > > > > MASTER NODE MASTER VOL MASTER BRICK SLAVE > > > STATUS CHECKPOINT STATUS > > > CRAWL STATUS > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > > SVR8048HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR8050HW2285 filews /export/sdb/filews > > > glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR8047HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A > > > Hybrid Crawl > > > SVR8049HW2285 filews /export/sdb/filews > > > glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A > > > Hybrid Crawl > > > SH02SVR5951 filews /export/sdb/brick1 > > > glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SH02SVR5953 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6995HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5954 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6994HW2285 filews /export/sdb/filews > > > glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR6993HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5952 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6996HW2285 filews /export/sdb/filews > > > glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > > > > On the faulty node, log file > > > /var/log/glusterfs/geo-replication/filews > > > shows > > > worker(/export/sdb/filews) died before establishing connection > > > > > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor: > > > ------------------------------------------------------------ > > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor: > > > starting gsyncd worker > > > [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__] > > > ChangelogAgent: Agent listining... > > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] > > > RepceServer: > > > terminating on reaching EOF. > > > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>: > > > exiting. > > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor: > > > worker(/export/sdb/filews) died before establishing connection > > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor: > > > ------------------------------------------------------------ > > > > > > Any advice and suggestions will be greatly appreciated. > > > > > > > > > > > > > > > > > > Best Regards > > > Yuyang Yang > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Gluster-users mailing list > > > > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > >
vyyy杨雨阳
2016-May-25 01:41 UTC
[Gluster-users] 答复: 答复: 答复: 答复: 答复: 答复: geo-replication status partial faulty
Commands output as following, Thanks [root at SVR8048HW2285 ~]# gluster volume geo-replication filews glusterfs01.sh3.ctripcorp.com::filews_slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ------------------------------------------------------------------------------------------------------------------------------------------------- SVR8048HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SH02SVR5954 filews /export/sdb/brick1 glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SH02SVR5951 filews /export/sdb/brick1 glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A N/A SVR8050HW2285 filews /export/sdb/filews glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A N/A SVR8049HW2285 filews /export/sdb/filews glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A Hybrid Crawl SVR8047HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A Hybrid Crawl SVR6995HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SVR6993HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SH02SVR5953 filews /export/sdb/brick1 glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SH02SVR5952 filews /export/sdb/brick1 glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SVR6996HW2285 filews /export/sdb/filews glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A N/A SVR6994HW2285 filews /export/sdb/filews glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A N/A [root at SVR8048HW2285 ~]# ls -l /var/lib/glusterd/geo-replication/ total 40 -rw------- 1 root root 14140 May 20 16:00 common_secret.pem.pub drwxr-xr-x 2 root root 4096 May 25 09:35 filews_glusterfs01.sh3.ctripcorp.com_filews_slave -rwxr-xr-x 1 root root 1845 May 17 15:04 gsyncd_template.conf -rw------- 1 root root 1675 May 20 11:03 secret.pem -rw-r--r-- 1 root root 400 May 20 11:03 secret.pem.pub -rw------- 1 root root 1675 May 20 16:00 tar_ssh.pem -rw-r--r-- 1 root root 400 May 20 16:00 tar_ssh.pem.pub [root at SVR8048HW2285 ~]# Best Regards ??? Yuyang Yang OPS Ctrip Infrastructure Service (CIS) Ctrip Computer Technology (Shanghai) Co., Ltd? Phone:?+ 86 21 34064880-15554?| Fax: + 86 21 52514588-13389 Web:?www.Ctrip.com -----????----- ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] ????: Tuesday, May 24, 2016 6:41 PM ???: vyyy??? <yuyangyang at Ctrip.com> ??: Saravanakumar Arumugam <sarumuga at redhat.com>; Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com> ??: Re: ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial faulty Ok, it looks like there is a problem with ssh key distribution. Before I suggest to clean those up and do setup again, could you share the output of following commands 1. gluster vol geo-rep <master_vol> <slave_host>::slave status 2. ls -l /var/lib/glusterd/geo-replication/ Is there multiple geo-rep sessions from this master volume or only one? Thanks and Regards, Kotresh H R ----- Original Message -----> From: "vyyy???" <yuyangyang at ctrip.com> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>, > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy" > <avishwan at redhat.com> > Sent: Tuesday, May 24, 2016 3:19:55 PM > Subject: ??: ??: ??: ??: [Gluster-users] ??: geo-replication status > partial faulty > > We can establish passwordless ssh directly with command 'ssh' , but > when create push-pem, it shows ' Passwordless ssh login has not been setup ' > unless copy secret.pem to *id_rsa.pub > > [root at SVR8048HW2285 ~]# ssh -i > /var/lib/glusterd/geo-replication/secret.pem > root at glusterfs01.sh3.ctripcorp.com > Last login: Tue May 24 17:23:53 2016 from 10.8.230.213 This is a > private network server, in monitoring state. > It is strictly prohibited to unauthorized access and used. > [root at SVR6519HW2285 ~]# > > > [root at SVR8048HW2285 filews]# gluster volume geo-replication filews > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > Passwordless ssh login has not been setup with > glusterfs01.sh3.ctripcorp.com for user root. > geo-replication command failed > [root at SVR8048HW2285 filews]# > > > > Best Regards > ??? Yuyang Yang > > > -----????----- > ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] > ????: Tuesday, May 24, 2016 3:22 PM > ???: vyyy??? <yuyangyang at Ctrip.com> > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy > <avishwan at redhat.com> > ??: Re: ??: ??: ??: [Gluster-users] ??: geo-replication status partial > faulty > > Hi > > Could you try following command from corresponding masters to faulty > slave nodes and share the output? > The below command should not ask for password and should run gsync.py. > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty > hosts> > > To establish passwordless ssh, it is not necessary to copy secret.pem > to *id_rsa.pub. > > If the geo-rep session is already established, passwordless ssh would > already be there. > My suspect is that when I asked you to do 'create force' you did it > using another slave where password less ssh was not setup. This would > create another session directory in > '/var/lib/glusterd/geo-replication' i.e > (<master_vol>_<slave_host>_<slave_vol>) > > Please check and let us know. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "vyyy???" <yuyangyang at ctrip.com> > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > Cc: "Saravanakumar Arumugam" <sarumuga at redhat.com>, > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy" > > <avishwan at redhat.com> > > Sent: Friday, May 20, 2016 12:35:58 PM > > Subject: ??: ??: ??: [Gluster-users] ??: geo-replication status > > partial faulty > > > > Hello, Kotresh > > > > I 'create force', but still some nodes work ,some nodes faulty. > > > > On faulty nodes > > etc-glusterfs-glusterd.vol.log shown: > > [2016-05-20 06:27:03.260870] I > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed > > config > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > [2016-05-20 06:27:03.404544] E > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable > > to read gsyncd status file > > [2016-05-20 06:27:03.404583] E > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to > > read the statusfile for /export/sdb/brick1 brick for > > filews(master), > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > > > > > > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65. > > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log > > shown: > > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor: > > ------------------------------------------------------------ > > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor: > > starting gsyncd worker > > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>: rpc_fd: > > '7,11,10,9' > > [2016-05-20 15:04:01.987505] I [changelogagent(agent):72:__init__] > > ChangelogAgent: Agent listining... > > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop] RepceServer: > > terminating on reaching EOF. > > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize] <top>: > > exiting. > > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor: > > worker(/export/sdb/brick1) died before establishing connection > > > > Can you help me! > > > > > > Best Regards > > ??? Yuyang Yang > > > > > > > > -----????----- > > ???: vyyy??? > > ????: Thursday, May 19, 2016 7:45 PM > > ???: 'Kotresh Hiremath Ravishankar' <khiremat at redhat.com> > > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy > > <avishwan at redhat.com> > > ??: ??: ??: ??: [Gluster-users] ??: geo-replication status partial > > faulty > > > > Still not work. > > > > I need copy /var/lib/glusterd/geo-replication/secret.* to > > /root/.ssh/id_rsa to make passwordless ssh work. > > > > I generate /var/lib/glusterd/geo-replication/secret.pem file on > > every master nodes. > > > > I am not sure is this right. > > > > > > [root at sh02svr5956 ~]# gluster volume geo-replication filews > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > > Passwordless ssh login has not been setup with > > glusterfs01.sh3.ctripcorp.com for user root. > > geo-replication command failed > > > > [root at sh02svr5956 .ssh]# cp > > /var/lib/glusterd/geo-replication/secret.pem > > ./id_rsa > > cp: overwrite `./id_rsa'? y > > [root at sh02svr5956 .ssh]# cp > > /var/lib/glusterd/geo-replication/secret.pem.pub > > ./id_rsa.pub > > cp: overwrite `./id_rsa.pub'? > > > > [root at sh02svr5956 ~]# gluster volume geo-replication filews > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > > Creating geo-replication session between filews & > > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful > > [root at sh02svr5956 ~]# > > > > > > > > > > Best Regards > > ??? Yuyang Yang > > OPS > > Ctrip Infrastructure Service (CIS) > > Ctrip Computer Technology (Shanghai) Co., Ltd > > Phone:?+ 86 21 34064880-15554?| Fax: + 86 21 52514588-13389 > > Web:?www.Ctrip.com > > > > > > -----????----- > > ???: Kotresh Hiremath Ravishankar [mailto:khiremat at redhat.com] > > ????: Thursday, May 19, 2016 5:07 PM > > ???: vyyy??? <yuyangyang at Ctrip.com> > > ??: Saravanakumar Arumugam <sarumuga at redhat.com>; > > Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy > > <avishwan at redhat.com> > > ??: Re: ??: ??: [Gluster-users] ??: geo-replication status partial > > faulty > > > > Hi, > > > > Could you just try 'create force' once to fix those status file errors? > > > > e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave vol> > > create push-pem force > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "vyyy???" <yuyangyang at ctrip.com> > > > To: "Saravanakumar Arumugam" <sarumuga at redhat.com>, > > > Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna Murthy" > > > <avishwan at redhat.com>, "Kotresh Hiremath Ravishankar" > > > <khiremat at redhat.com> > > > Sent: Thursday, May 19, 2016 2:15:34 PM > > > Subject: ??: ??: [Gluster-users] ??: geo-replication status > > > partial faulty > > > > > > I have checked all the nodes both on masters and slaves, the > > > software is the same. > > > > > > I am puzzled why there were half masters work, halt faulty. > > > > > > > > > [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster > > > glusterfs-api-3.6.3-1.el6.x86_64 > > > glusterfs-fuse-3.6.3-1.el6.x86_64 > > > glusterfs-geo-replication-3.6.3-1.el6.x86_64 > > > glusterfs-3.6.3-1.el6.x86_64 > > > glusterfs-cli-3.6.3-1.el6.x86_64 > > > glusterfs-server-3.6.3-1.el6.x86_64 > > > glusterfs-libs-3.6.3-1.el6.x86_64 > > > > > > > > > > > > > > > Best Regards > > > ??? Yuyang Yang > > > > > > OPS > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology > > > (Shanghai) Co., Ltd > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > > > Web: www.Ctrip.com<http://www.ctrip.com/> > > > > > > > > > > > > ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com] > > > ????: Thursday, May 19, 2016 4:33 PM > > > ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at gluster.org; > > > Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com>; > > > Kotresh Hiremath Ravishankar <khiremat at redhat.com> > > > ??: Re: ??: [Gluster-users] ??: geo-replication status partial > > > faulty > > > > > > Hi, > > > +geo-rep team. > > > > > > Can you get the gluster version you are using? > > > > > > # For example: > > > rpm -qa | grep gluster > > > > > > I hope you have same gluster version installed everywhere. > > > Please double check and share the same. > > > > > > Thanks, > > > Saravana > > > On 05/19/2016 01:37 PM, vyyy??? wrote: > > > Hi, Saravana > > > > > > I have changed log level to DEBUG. Then start geo-replication with > > > log-file option, attached the file. > > > > > > gluster volume geo-replication filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave start > > > --log-file=geo.log > > > > > > I have checked /root/.ssh/authorized_keys in > > > glusterfs01.sh3.ctripcorp.com , It have entries in > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub. > > > and I have removed the lines not started with ?command=? > > > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@ > > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no ssh > > > error. > > > > > > > > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows : > > > > > > [2016-05-19 06:39:23.405974] I > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using > > > passed config > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > [2016-05-19 06:39:23.541169] E > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: > > > Unable to read gsyncd status file > > > [2016-05-19 06:39:23.541210] E > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to > > > read the statusfile for /export/sdb/filews brick for > > > filews(master), > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > > > [2016-05-19 06:39:29.472047] I > > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using > > > passed config > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > [2016-05-19 06:39:34.939709] I > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using > > > passed config > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > [2016-05-19 06:39:35.058520] E > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: > > > Unable to read gsyncd status file > > > > > > > > > /var/log/glusterfs/geo-replication/filews/ > > > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilew > > > s_ > > > sl > > > ave.log > > > shows as following: > > > > > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor: > > > ------------------------------------------------------------ > > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor: > > > starting gsyncd worker > > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd: > > > '7,11,10,9' > > > [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__] > > > ChangelogAgent: Agent listining... > > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor: > > > worker(/export/sdb/filews) died before establishing connection > > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] > > > RepceServer: > > > terminating on reaching EOF. > > > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>: > > > exiting. > > > > > > > > > > > > > > > > > > > > > Best Regards > > > Yuyang Yang > > > > > > > > > > > > > > > > > > ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com]and share > > > what's the output? > > > ????: Thursday, May 19, 2016 1:59 PM > > > ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang at Ctrip.com>; > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> > > > ??: Re: [Gluster-users] ??: geo-replication status partial faulty > > > > > > Hi, > > > > > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node. > > > Can you share the complete logs ? > > > > > > You can increase verbosity of debug messages like this: > > > gluster volume geo-replication <master volume> <slave > > > host>::<slave > > > volume> config log-level DEBUG > > > > > > > > > Also, check /root/.ssh/authorized_keys in > > > glusterfs01.sh3.ctripcorp.com It should have entries in > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present > > > in master node). > > > > > > Have a look at this one for example: > > > https://www.gluster.org/pipermail/gluster-users/2015-August/023174 > > > .h > > > tm > > > l > > > > > > Thanks, > > > Saravana > > > On 05/19/2016 07:53 AM, vyyy??? wrote: > > > Hello, > > > > > > I have tried to config a geo-replication volume , all the master > > > nodes configuration are the same, When I start this volume, the > > > status shows partial faulty as following: > > > > > > gluster volume geo-replication filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave status > > > > > > MASTER NODE MASTER VOL MASTER BRICK SLAVE > > > STATUS CHECKPOINT STATUS > > > CRAWL STATUS > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > > SVR8048HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR8050HW2285 filews /export/sdb/filews > > > glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR8047HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A > > > Hybrid Crawl > > > SVR8049HW2285 filews /export/sdb/filews > > > glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A > > > Hybrid Crawl > > > SH02SVR5951 filews /export/sdb/brick1 > > > glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SH02SVR5953 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6995HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5954 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6994HW2285 filews /export/sdb/filews > > > glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR6993HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5952 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6996HW2285 filews /export/sdb/filews > > > glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > > > > On the faulty node, log file > > > /var/log/glusterfs/geo-replication/filews > > > shows > > > worker(/export/sdb/filews) died before establishing connection > > > > > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor: > > > ------------------------------------------------------------ > > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor: > > > starting gsyncd worker > > > [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__] > > > ChangelogAgent: Agent listining... > > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] > > > RepceServer: > > > terminating on reaching EOF. > > > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>: > > > exiting. > > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor: > > > worker(/export/sdb/filews) died before establishing connection > > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor: > > > ------------------------------------------------------------ > > > > > > Any advice and suggestions will be greatly appreciated. > > > > > > > > > > > > > > > > > > Best Regards > > > Yuyang Yang > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Gluster-users mailing list > > > > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > >