thr3ads.net - Gluster users - [Gluster-users] Geo-Replication Issue while upgrading [Nov 2019]

If this information is useful, please help other people find it:
Share via:
Sunny Kumar
2019-Nov-28 12:09 UTC
[Gluster-users] Geo-Replication Issue while upgrading

Hi Deepu,

Looks like this is error generated due to ssh restrictions:
Can you please check and confirm ssh is properly configured?


2019-11-28 11:59:12.934436] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**************************************************************************************************************************

[2019-11-28 11:59:12.934703] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING:
This system is a restricted access system.  All activity on this
system is subject to monitoring.  If information collected reveals
possible criminal activity or activity that exceeds privileges,
evidence of such activity may be providedto the relevant authorities
for further action.

[2019-11-28 11:59:12.934967] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By
continuing past this point, you expressly consent to   this
monitoring.- ZOHO Corporation

[2019-11-28 11:59:12.935194] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**************************************************************************************************************************

2019-11-28 11:59:12.944369] I [repce(agent
/home/sas/gluster/data/code-misc):97:service_loop] RepceServer:
terminating on reaching EOF.

/sunny

On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan <sdeepugd at gmail.com>
wrote:>
>
>
> ---------- Forwarded message ---------
> From: deepu srinivasan <sdeepugd at gmail.com>
> Date: Thu, Nov 28, 2019 at 5:32 PM
> Subject: Geo-Replication Issue while upgrading
> To: gluster-users <gluster-users at gluster.org>
>
>
> Hi Users/Developers
> I hope you remember the last issue we faced regarding the geo-replication
goes to the faulty state while stopping and starting the geo-replication.
>>
>> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker
/home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker Status
Change       status=Active
>> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker
/home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] GeorepStatus:
Crawl Status Change   status=History Crawl
>> [2019-11-16 17:29:43.630328] I [master(worker
/home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history crawl 
turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)     etime=1573925383
>> [2019-11-16 17:29:44.636725] I [master(worker
/home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time    
stime=(1573924576, 0)
>> [2019-11-16 17:29:44.778966] I [master(worker
/home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] _GMaster:
Fixing ENOENT error in slave. Parent does not exist on master. Safe to ignore,
take out entry       retry_count=1   entry=({'uid': 0, 'gfid':
'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0,
'mode': 33188, 'entry':
'.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op':
'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch':
False, 'slave_name': None, 'slave_gfid': None,
'name_mismatch': False, 'dst': False})
>> [2019-11-16 17:29:44.779306] I [master(worker
/home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster:
Sucessfully fixed entry ops with gfid mismatch    retry_count=1
>> [2019-11-16 17:29:44.779516] I [master(worker
/home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry original
entries. count = 1
>> [2019-11-16 17:29:44.879321] E [repce(worker
/home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed 
call=151945:140353273153344:1573925384.78       method=entry_ops       
error=OSError
>> [2019-11-16 17:29:44.879750] E [syncdutils(worker
/home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: FAIL:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py",
line 322, in main
>>     func(args)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py",
line 82, in subcmd_worker
>>     local.service_loop(remote)
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in
service_loop
>>     g3.crawlwrap(oneshot=True)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 599, in crawlwrap
>>     self.crawl()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 1555, in crawl
>>     self.changelogs_batch_process(changes)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 1455, in changelogs_batch_process
>>     self.process(batch)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 1290, in process
>>     self.process_change(change, done, retry)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 1195, in process_change
>>     failures = self.slave.server.entry_ops(entries)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 233, in __call__
>>     return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 215, in __call__
>>     raise res
>> OSError: [Errno 13] Permission denied:
'/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
>> [2019-11-16 17:29:44.911767] I [repce(agent
/home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on
reaching EOF.
>> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor:
worker died in startup phase     brick=/home/sas/gluster/data/code-misc6
>> [2019-11-16 17:29:45.511806] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
>
>
>
>
> Now after upgrading to 7.0 version from 5.6 we got an error in
geo-replication.
> Scenario:
>
> We had a 1x3 replication and distributed volume in each DC.
> Both volumes are started and the geo-replication session is set up between
them and the files are synched. Now the geo-replication session is deleted.
> Started to upgrade to 7.0 for each server starting from the slave end. I
followed this link -->
https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/
> After starting the glusterd process created a geo-replication again but
ends up in a faulty state. Please find the logs
>
>> [2019-11-28 11:59:12.370255] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
status=Initializing...
>>
>> [2019-11-28 11:59:12.370615] I [monitor(monitor):159:monitor] Monitor:
starting gsyncd worker brick=/home/sas/gluster/data/code-misc
slave_node=192.168.185.84
>>
>> [2019-11-28 11:59:12.445581] I [gsyncd(agent
/home/sas/gluster/data/code-misc):311:main] <top>: Using session config
file
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf
>>
>> [2019-11-28 11:59:12.448383] I [changelogagent(agent
/home/sas/gluster/data/code-misc):72:__init__] ChangelogAgent: Agent
listining...
>>
>> [2019-11-28 11:59:12.453881] I [gsyncd(worker
/home/sas/gluster/data/code-misc):311:main] <top>: Using session config
file
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf
>>
>> [2019-11-28 11:59:12.472862] I [resource(worker
/home/sas/gluster/data/code-misc):1386:connect_remote] SSH: Initializing SSH
connection between master and slave...
>>
>> [2019-11-28 11:59:12.933346] E [syncdutils(worker
/home/sas/gluster/data/code-misc):311:log_raise_exception] <top>:
connection to peer is broken
>>
>> [2019-11-28 11:59:12.934117] E [syncdutils(worker
/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error
cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-tKcFQe/5697733f424862ab9d57e019de78aca6.sock sas at
192.168.185.84 /usr/libexec/glusterfs/gsyncd slave code-misc sas at
192.168.185.118::code-misc --master-node 192.168.185.89 --master-node-id
a7a9688e-700c-4452-9cd6-e10d6eed5335 --master-brick
/home/sas/gluster/data/code-misc --local-node 192.168.185.84 --local-node-id
cbafeca3-650b-4c9e-8ea6-2451ea9265dd --slave-timeout 120 --slave-log-level INFO
--slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin
--master-dist-count 3 error=1
>>
>> [2019-11-28 11:59:12.934436] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**************************************************************************************************************************
>>
>> [2019-11-28 11:59:12.934703] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: This
system is a restricted access system.  All activity on this system is subject to
monitoring.  If information collected reveals possible criminal activity or
activity that exceeds privileges, evidence of such activity may be providedto
the relevant authorities for further action.
>>
>> [2019-11-28 11:59:12.934967] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By continuing past
this point, you expressly consent to   this monitoring.- ZOHO Corporation
>>
>> [2019-11-28 11:59:12.935194] E [syncdutils(worker
/home/sas/gluster/data/code-misc):809:logerr] Popen: ssh>
**************************************************************************************************************************
>>
>> [2019-11-28 11:59:12.944369] I [repce(agent
/home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on
reaching EOF.
>>
>> [2019-11-28 11:59:12.944722] I [monitor(monitor):280:monitor] Monitor:
worker died in startup phase brick=/home/sas/gluster/data/code-misc
>>
>> [2019-11-28 11:59:12.947575] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
>
>
Gluster users - Nov 2019 - Geo-Replication Issue while upgrading

[Gluster-users] Geo-Replication Issue while upgrading