Hi Deepu, Looks like this is error generated due to ssh restrictions: Can you please check and confirm ssh is properly configured? 2019-11-28 11:59:12.934436] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> ************************************************************************************************************************** [2019-11-28 11:59:12.934703] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: This system is a restricted access system. All activity on this system is subject to monitoring. If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action. [2019-11-28 11:59:12.934967] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By continuing past this point, you expressly consent to this monitoring.- ZOHO Corporation [2019-11-28 11:59:12.935194] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> ************************************************************************************************************************** 2019-11-28 11:59:12.944369] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. /sunny On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan <sdeepugd at gmail.com> wrote:> > > > ---------- Forwarded message --------- > From: deepu srinivasan <sdeepugd at gmail.com> > Date: Thu, Nov 28, 2019 at 5:32 PM > Subject: Geo-Replication Issue while upgrading > To: gluster-users <gluster-users at gluster.org> > > > Hi Users/Developers > I hope you remember the last issue we faced regarding the geo-replication goes to the faulty state while stopping and starting the geo-replication. >> >> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker Status Change status=Active >> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl >> [2019-11-16 17:29:43.630328] I [master(worker /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history crawl turns=1 stime=(1573924576, 0) entry_stime=(1573924576, 0) etime=1573925383 >> [2019-11-16 17:29:44.636725] I [master(worker /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time stime=(1573924576, 0) >> [2019-11-16 17:29:44.778966] I [master(worker /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] _GMaster: Fixing ENOENT error in slave. Parent does not exist on master. Safe to ignore, take out entry retry_count=1 entry=({'uid': 0, 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188, 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False}) >> [2019-11-16 17:29:44.779306] I [master(worker /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch retry_count=1 >> [2019-11-16 17:29:44.779516] I [master(worker /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry original entries. count = 1 >> [2019-11-16 17:29:44.879321] E [repce(worker /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed call=151945:140353273153344:1573925384.78 method=entry_ops error=OSError >> [2019-11-16 17:29:44.879750] E [syncdutils(worker /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: FAIL: >> Traceback (most recent call last): >> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in main >> func(args) >> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in subcmd_worker >> local.service_loop(remote) >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in service_loop >> g3.crawlwrap(oneshot=True) >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in crawlwrap >> self.crawl() >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in crawl >> self.changelogs_batch_process(changes) >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in changelogs_batch_process >> self.process(batch) >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in process >> self.process_change(change, done, retry) >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in process_change >> failures = self.slave.server.entry_ops(entries) >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in __call__ >> return self.ins(self.meth, *a) >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in __call__ >> raise res >> OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb' >> [2019-11-16 17:29:44.911767] I [repce(agent /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on reaching EOF. >> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc6 >> [2019-11-16 17:29:45.511806] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > > > > > Now after upgrading to 7.0 version from 5.6 we got an error in geo-replication. > Scenario: > > We had a 1x3 replication and distributed volume in each DC. > Both volumes are started and the geo-replication session is set up between them and the files are synched. Now the geo-replication session is deleted. > Started to upgrade to 7.0 for each server starting from the slave end. I followed this link --> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ > After starting the glusterd process created a geo-replication again but ends up in a faulty state. Please find the logs > >> [2019-11-28 11:59:12.370255] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... >> >> [2019-11-28 11:59:12.370615] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker brick=/home/sas/gluster/data/code-misc slave_node=192.168.185.84 >> >> [2019-11-28 11:59:12.445581] I [gsyncd(agent /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf >> >> [2019-11-28 11:59:12.448383] I [changelogagent(agent /home/sas/gluster/data/code-misc):72:__init__] ChangelogAgent: Agent listining... >> >> [2019-11-28 11:59:12.453881] I [gsyncd(worker /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf >> >> [2019-11-28 11:59:12.472862] I [resource(worker /home/sas/gluster/data/code-misc):1386:connect_remote] SSH: Initializing SSH connection between master and slave... >> >> [2019-11-28 11:59:12.933346] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: connection to peer is broken >> >> [2019-11-28 11:59:12.934117] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-tKcFQe/5697733f424862ab9d57e019de78aca6.sock sas at 192.168.185.84 /usr/libexec/glusterfs/gsyncd slave code-misc sas at 192.168.185.118::code-misc --master-node 192.168.185.89 --master-node-id a7a9688e-700c-4452-9cd6-e10d6eed5335 --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.84 --local-node-id cbafeca3-650b-4c9e-8ea6-2451ea9265dd --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 3 error=1 >> >> [2019-11-28 11:59:12.934436] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> ************************************************************************************************************************** >> >> [2019-11-28 11:59:12.934703] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: This system is a restricted access system. All activity on this system is subject to monitoring. If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action. >> >> [2019-11-28 11:59:12.934967] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By continuing past this point, you expressly consent to this monitoring.- ZOHO Corporation >> >> [2019-11-28 11:59:12.935194] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> ************************************************************************************************************************** >> >> [2019-11-28 11:59:12.944369] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. >> >> [2019-11-28 11:59:12.944722] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >> >> [2019-11-28 11:59:12.947575] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > >