vyyy杨雨阳
2016-May-19 08:45 UTC
[Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty
I have checked all the nodes both on masters and slaves, the software is the same. I am puzzled why there were half masters work, halt faulty. [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster glusterfs-api-3.6.3-1.el6.x86_64 glusterfs-fuse-3.6.3-1.el6.x86_64 glusterfs-geo-replication-3.6.3-1.el6.x86_64 glusterfs-3.6.3-1.el6.x86_64 glusterfs-cli-3.6.3-1.el6.x86_64 glusterfs-server-3.6.3-1.el6.x86_64 glusterfs-libs-3.6.3-1.el6.x86_64 Best Regards ??? Yuyang Yang OPS Ctrip Infrastructure Service (CIS) Ctrip Computer Technology (Shanghai) Co., Ltd Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 Web: www.Ctrip.com<http://www.ctrip.com/> ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com] ????: Thursday, May 19, 2016 4:33 PM ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at gluster.org; Aravinda Vishwanathapura Krishna Murthy <avishwan at redhat.com>; Kotresh Hiremath Ravishankar <khiremat at redhat.com> ??: Re: ??: [Gluster-users] ??: geo-replication status partial faulty Hi, +geo-rep team. Can you get the gluster version you are using? # For example: rpm -qa | grep gluster I hope you have same gluster version installed everywhere. Please double check and share the same. Thanks, Saravana On 05/19/2016 01:37 PM, vyyy??? wrote: Hi, Saravana I have changed log level to DEBUG. Then start geo-replication with log-file option, attached the file. gluster volume geo-replication filews glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log I have checked /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com , It have entries in /var/lib/glusterd/geo-replication/common_secret.pem.pub. and I have removed the lines not started with ?command=? ssh -i /var/lib/glusterd/geo-replication/secret.pem root@ glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no ssh error. Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows : [2016-05-19 06:39:23.405974] I [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). [2016-05-19 06:39:23.541169] E [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read gsyncd status file [2016-05-19 06:39:23.541210] E [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read the statusfile for /export/sdb/filews brick for filews(master), glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session [2016-05-19 06:39:29.472047] I [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using passed config template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). [2016-05-19 06:39:34.939709] I [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). [2016-05-19 06:39:35.058520] E [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read gsyncd status file /var/log/glusterfs/geo-replication/filews/ ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log shows as following: [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor: ------------------------------------------------------------ [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor: starting gsyncd worker [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd: '7,11,10,9' [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining... [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor: worker(/export/sdb/filews) died before establishing connection [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>: exiting. Best Regards Yuyang Yang ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com] ????: Thursday, May 19, 2016 1:59 PM ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang at Ctrip.com>; Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> ??: Re: [Gluster-users] ??: geo-replication status partial faulty Hi, There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node. Can you share the complete logs ? You can increase verbosity of debug messages like this: gluster volume geo-replication <master volume> <slave host>::<slave volume> config log-level DEBUG Also, check /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com It should have entries in /var/lib/glusterd/geo-replication/common_secret.pem.pub (present in master node). Have a look at this one for example: https://www.gluster.org/pipermail/gluster-users/2015-August/023174.html Thanks, Saravana On 05/19/2016 07:53 AM, vyyy??? wrote: Hello, I have tried to config a geo-replication volume , all the master nodes configuration are the same, When I start this volume, the status shows partial faulty as following: gluster volume geo-replication filews glusterfs01.sh3.ctripcorp.com::filews_slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ------------------------------------------------------------------------------------------------------------------------------------------------- SVR8048HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SVR8050HW2285 filews /export/sdb/filews glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A N/A SVR8047HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A Hybrid Crawl SVR8049HW2285 filews /export/sdb/filews glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A Hybrid Crawl SH02SVR5951 filews /export/sdb/brick1 glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A N/A SH02SVR5953 filews /export/sdb/brick1 glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SVR6995HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SH02SVR5954 filews /export/sdb/brick1 glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SVR6994HW2285 filews /export/sdb/filews glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A N/A SVR6993HW2285 filews /export/sdb/filews glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SH02SVR5952 filews /export/sdb/brick1 glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A N/A SVR6996HW2285 filews /export/sdb/filews glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A N/A On the faulty node, log file /var/log/glusterfs/geo-replication/filews shows worker(/export/sdb/filews) died before establishing connection [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor: ------------------------------------------------------------ [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor: starting gsyncd worker [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining... [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>: exiting. [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor: worker(/export/sdb/filews) died before establishing connection [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor: ------------------------------------------------------------ Any advice and suggestions will be greatly appreciated. Best Regards ?????? Yuyang Yang _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> http://www.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160519/d881cd04/attachment.html>
Kotresh Hiremath Ravishankar
2016-May-19 09:06 UTC
[Gluster-users] 答复: 答复: 答复: geo-replication status partial faulty
Hi, Could you just try 'create force' once to fix those status file errors? e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave vol> create push-pem force Thanks and Regards, Kotresh H R ----- Original Message -----> From: "vyyy???" <yuyangyang at ctrip.com> > To: "Saravanakumar Arumugam" <sarumuga at redhat.com>, Gluster-users at gluster.org, "Aravinda Vishwanathapura Krishna > Murthy" <avishwan at redhat.com>, "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > Sent: Thursday, May 19, 2016 2:15:34 PM > Subject: ??: ??: [Gluster-users] ??: geo-replication status partial faulty > > I have checked all the nodes both on masters and slaves, the software is the > same. > > I am puzzled why there were half masters work, halt faulty. > > > [admin at SVR6996HW2285 ~]$ rpm -qa |grep gluster > glusterfs-api-3.6.3-1.el6.x86_64 > glusterfs-fuse-3.6.3-1.el6.x86_64 > glusterfs-geo-replication-3.6.3-1.el6.x86_64 > glusterfs-3.6.3-1.el6.x86_64 > glusterfs-cli-3.6.3-1.el6.x86_64 > glusterfs-server-3.6.3-1.el6.x86_64 > glusterfs-libs-3.6.3-1.el6.x86_64 > > > > > Best Regards > ??? Yuyang Yang > > OPS > Ctrip Infrastructure Service (CIS) > Ctrip Computer Technology (Shanghai) Co., Ltd > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > Web: www.Ctrip.com<http://www.ctrip.com/> > > > > ???: Saravanakumar Arumugam [mailto:sarumuga at redhat.com] > ????: Thursday, May 19, 2016 4:33 PM > ???: vyyy??? <yuyangyang at Ctrip.com>; Gluster-users at gluster.org; Aravinda > Vishwanathapura Krishna Murthy <avishwan at redhat.com>; Kotresh Hiremath > Ravishankar <khiremat at redhat.com> > ??: Re: ??: [Gluster-users] ??: geo-replication status partial faulty > > Hi, > +geo-rep team. > > Can you get the gluster version you are using? > > # For example: > rpm -qa | grep gluster > > I hope you have same gluster version installed everywhere. > Please double check and share the same. > > Thanks, > Saravana > On 05/19/2016 01:37 PM, vyyy??? wrote: > Hi, Saravana > > I have changed log level to DEBUG. Then start geo-replication with log-file > option, attached the file. > > gluster volume geo-replication filews > glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log > > I have checked /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com , > It have entries in /var/lib/glusterd/geo-replication/common_secret.pem.pub. > and I have removed the lines not started with ?command=? > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@ > glusterfs01.sh3.ctripcorp.com > I can see gsyncd messages and no ssh error. > > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows : > > [2016-05-19 06:39:23.405974] I > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > [2016-05-19 06:39:23.541169] E > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read > gsyncd status file > [2016-05-19 06:39:23.541210] E > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read the > statusfile for /export/sdb/filews brick for filews(master), > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > [2016-05-19 06:39:29.472047] I > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using passed > config > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > [2016-05-19 06:39:34.939709] I > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > [2016-05-19 06:39:35.058520] E > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read > gsyncd status file > > > /var/log/glusterfs/geo-replication/filews/ > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log > shows as following: > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor: > ------------------------------------------------------------ > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor: > starting gsyncd worker > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd: > '7,11,10,9' > [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__] > ChangelogAgent: Agent listining... > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor: > worker(/export/sdb/filews) died before establishing connection > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] RepceServer: > terminating on reaching EOF. > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>: > exiting. > > > > > > > Best Regards > Yuyang Yang > > > > > > ? ??: Saravanakumar Arumugam [mailto:sarumuga at redhat.com] > ????: Thursday, May 19, 2016 1:59 PM > ???: vyyy??? <yuyangyang at Ctrip.com><mailto:yuyangyang at Ctrip.com>; > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> > ??: Re: [Gluster-users] ??: geo-replication status partial faulty > > Hi, > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node. > Can you share the complete logs ? > > You can increase verbosity of debug messages like this: > gluster volume geo-replication <master volume> <slave host>::<slave volume> > config log-level DEBUG > > > Also, check /root/.ssh/authorized_keys in glusterfs01.sh3.ctripcorp.com > It should have entries in > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present in master > node). > > Have a look at this one for example: > https://www.gluster.org/pipermail/gluster-users/2015-August/023174.html > > Thanks, > Saravana > On 05/19/2016 07:53 AM, vyyy??? wrote: > Hello, > > I have tried to config a geo-replication volume , all the master nodes > configuration are the same, When I start this volume, the status shows > partial faulty as following: > > gluster volume geo-replication filews > glusterfs01.sh3.ctripcorp.com::filews_slave status > > MASTER NODE MASTER VOL MASTER BRICK SLAVE > STATUS CHECKPOINT STATUS > CRAWL STATUS > ------------------------------------------------------------------------------------------------------------------------------------------------- > SVR8048HW2285 filews /export/sdb/filews > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > N/A > SVR8050HW2285 filews /export/sdb/filews > glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A > N/A > SVR8047HW2285 filews /export/sdb/filews > glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A > Hybrid Crawl > SVR8049HW2285 filews /export/sdb/filews > glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A > Hybrid Crawl > SH02SVR5951 filews /export/sdb/brick1 > glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A > N/A > SH02SVR5953 filews /export/sdb/brick1 > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > N/A > SVR6995HW2285 filews /export/sdb/filews > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > N/A > SH02SVR5954 filews /export/sdb/brick1 > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > N/A > SVR6994HW2285 filews /export/sdb/filews > glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A > N/A > SVR6993HW2285 filews /export/sdb/filews > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > N/A > SH02SVR5952 filews /export/sdb/brick1 > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > N/A > SVR6996HW2285 filews /export/sdb/filews > glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A > N/A > > On the faulty node, log file /var/log/glusterfs/geo-replication/filews shows > worker(/export/sdb/filews) died before establishing connection > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor: > ------------------------------------------------------------ > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor: > starting gsyncd worker > [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__] > ChangelogAgent: Agent listining... > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] RepceServer: > terminating on reaching EOF. > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>: > exiting. > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor: > worker(/export/sdb/filews) died before establishing connection > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor: > ------------------------------------------------------------ > > Any advice and suggestions will be greatly appreciated. > > > > > > Best Regards > ?????? Yuyang Yang > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> > > http://www.gluster.org/mailman/listinfo/gluster-users > > >