Thanks Deepu. I will investigate this can you just summarize the steps which can be helpful in reproducing this issue. /sunny On Fri, Nov 29, 2019 at 7:29 AM deepu srinivasan <sdeepugd at gmail.com> wrote:> > Hi Sunny > The issue seems to be a bug. > The issue got fixed when I restarted the glusterd daemon in the slave machines. The logs in the slave end reported that the mount-broker folder was not in the vol file. So when I restarted the machine it got fixed. > This might be some race condition. > > On Thu, Nov 28, 2019 at 9:00 PM deepu srinivasan <sdeepugd at gmail.com> wrote: >> >> Hi Sunny >> I Also got this error in slave end >>> >>> [2019-11-28 15:30:12.520461] I [resource(slave 192.168.185.89/home/sas/gluster/data/code-misc):1105:connect] GLUSTER: Mounting gluster volume locally... >>> >>> [2019-11-28 15:30:12.649425] E [resource(slave 192.168.185.89/home/sas/gluster/data/code-misc):1013:handle_mounter] MountbrokerMounter: glusterd answered mnt>>> >>> [2019-11-28 15:30:12.650573] E [syncdutils(slave 192.168.185.89/home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=/usr/sbin/gluster --remote-host=localhost system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.118_code-misc/mnt-192.168.185.89-home-sas-gluster-data-code-misc.log volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 >>> >>> [2019-11-28 15:30:12.650742] E [syncdutils(slave 192.168.185.89/home/sas/gluster/data/code-misc):809:logerr] Popen: /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) >> >> >> On Thu, Nov 28, 2019 at 6:45 PM deepu srinivasan <sdeepugd at gmail.com> wrote: >>> >>> root at 192.168.185.101/var/log/glusterfs#ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 sas at 192.168.185.118 "sudo gluster volume status" >>> >>> ************************************************************************************************************************** >>> >>> WARNING: This system is a restricted access system. All activity on this system is subject to monitoring. If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action. >>> >>> By continuing past this point, you expressly consent to this monitoring >>> >>> ************************************************************************************************************************** >>> >>> invoking sudo in restricted SSH session is not allowed >>> >>> >>> On Thu, Nov 28, 2019 at 6:04 PM Sunny Kumar <sunkumar at redhat.com> wrote: >>>> >>>> Hi Deepu, >>>> >>>> Can you try this: >>>> >>>> ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i >>>> /var/lib/glusterd/geo-replication/secret.pem -p 22 >>>> sas at 192.168.185.118 "sudo gluster volume status" >>>> >>>> /sunny >>>> >>>> >>>> On Thu, Nov 28, 2019 at 12:14 PM deepu srinivasan <sdeepugd at gmail.com> wrote: >>>> >> >>>> >> MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED >>>> >> >>>> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >> >>>> >> 192.168.185.89 code-misc /home/sas/gluster/data/code-misc sas sas at 192.168.185.118::code-misc N/A Faulty N/A N/A >>>> >> >>>> >> 192.168.185.101 code-misc /home/sas/gluster/data/code-misc sas sas at 192.168.185.118::code-misc 192.168.185.118 Passive N/A N/A >>>> >> >>>> >> 192.168.185.93 code-misc /home/sas/gluster/data/code-misc sas sas at 192.168.185.118::code-misc N/A Faulty N/A N/A >>>> > >>>> > >>>> > On Thu, Nov 28, 2019 at 5:43 PM deepu srinivasan <sdeepugd at gmail.com> wrote: >>>> >> >>>> >> I Think its configured properly. Should i check something else.. >>>> >> >>>> >> root at 192.168.185.89/var/log/glusterfs#ssh sas at 192.168.185.118 "sudo gluster volume info" >>>> >> >>>> >> ************************************************************************************************************************** >>>> >> >>>> >> WARNING: This system is a restricted access system. All activity on this system is subject to monitoring. If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action. >>>> >> >>>> >> By continuing past this point, you expressly consent to this monitoring.- >>>> >> >>>> >> ************************************************************************************************************************** >>>> >> >>>> >> >>>> >> >>>> >> Volume Name: code-misc >>>> >> >>>> >> Type: Replicate >>>> >> >>>> >> Volume ID: e9b6fbed-fcd0-42a9-ab11-02ec39c2ee07 >>>> >> >>>> >> Status: Started >>>> >> >>>> >> Snapshot Count: 0 >>>> >> >>>> >> Number of Bricks: 1 x 3 = 3 >>>> >> >>>> >> Transport-type: tcp >>>> >> >>>> >> Bricks: >>>> >> >>>> >> Brick1: 192.168.185.118:/home/sas/gluster/data/code-misc >>>> >> >>>> >> Brick2: 192.168.185.45:/home/sas/gluster/data/code-misc >>>> >> >>>> >> Brick3: 192.168.185.84:/home/sas/gluster/data/code-misc >>>> >> >>>> >> Options Reconfigured: >>>> >> >>>> >> features.read-only: enable >>>> >> >>>> >> transport.address-family: inet >>>> >> >>>> >> nfs.disable: on >>>> >> >>>> >> performance.client-io-threads: off >>>> >> >>>> >> >>>> >> On Thu, Nov 28, 2019 at 5:40 PM Sunny Kumar <sunkumar at redhat.com> wrote: >>>> >>> >>>> >>> Hi Deepu, >>>> >>> >>>> >>> Looks like this is error generated due to ssh restrictions: >>>> >>> Can you please check and confirm ssh is properly configured? >>>> >>> >>>> >>> >>>> >>> 2019-11-28 11:59:12.934436] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> >>>> >>> ************************************************************************************************************************** >>>> >>> >>>> >>> [2019-11-28 11:59:12.934703] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: >>>> >>> This system is a restricted access system. All activity on this >>>> >>> system is subject to monitoring. If information collected reveals >>>> >>> possible criminal activity or activity that exceeds privileges, >>>> >>> evidence of such activity may be providedto the relevant authorities >>>> >>> for further action. >>>> >>> >>>> >>> [2019-11-28 11:59:12.934967] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By >>>> >>> continuing past this point, you expressly consent to this >>>> >>> monitoring.- ZOHO Corporation >>>> >>> >>>> >>> [2019-11-28 11:59:12.935194] E [syncdutils(worker >>>> >>> /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> >>>> >>> ************************************************************************************************************************** >>>> >>> >>>> >>> 2019-11-28 11:59:12.944369] I [repce(agent >>>> >>> /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: >>>> >>> terminating on reaching EOF. >>>> >>> >>>> >>> /sunny >>>> >>> >>>> >>> On Thu, Nov 28, 2019 at 12:03 PM deepu srinivasan <sdeepugd at gmail.com> wrote: >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > ---------- Forwarded message --------- >>>> >>> > From: deepu srinivasan <sdeepugd at gmail.com> >>>> >>> > Date: Thu, Nov 28, 2019 at 5:32 PM >>>> >>> > Subject: Geo-Replication Issue while upgrading >>>> >>> > To: gluster-users <gluster-users at gluster.org> >>>> >>> > >>>> >>> > >>>> >>> > Hi Users/Developers >>>> >>> > I hope you remember the last issue we faced regarding the geo-replication goes to the faulty state while stopping and starting the geo-replication. >>>> >>> >> >>>> >>> >> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker Status Change status=Active >>>> >>> >> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl >>>> >>> >> [2019-11-16 17:29:43.630328] I [master(worker /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history crawl turns=1 stime=(1573924576, 0) entry_stime=(1573924576, 0) etime=1573925383 >>>> >>> >> [2019-11-16 17:29:44.636725] I [master(worker /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time stime=(1573924576, 0) >>>> >>> >> [2019-11-16 17:29:44.778966] I [master(worker /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures] _GMaster: Fixing ENOENT error in slave. Parent does not exist on master. Safe to ignore, take out entry retry_count=1 entry=({'uid': 0, 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3', 'gid': 0, 'mode': 33188, 'entry': '.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op': 'CREATE'}, 2, {'slave_isdir': False, 'gfid_mismatch': False, 'slave_name': None, 'slave_gfid': None, 'name_mismatch': False, 'dst': False}) >>>> >>> >> [2019-11-16 17:29:44.779306] I [master(worker /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch retry_count=1 >>>> >>> >> [2019-11-16 17:29:44.779516] I [master(worker /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry original entries. count = 1 >>>> >>> >> [2019-11-16 17:29:44.879321] E [repce(worker /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed call=151945:140353273153344:1573925384.78 method=entry_ops error=OSError >>>> >>> >> [2019-11-16 17:29:44.879750] E [syncdutils(worker /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>: FAIL: >>>> >>> >> Traceback (most recent call last): >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 322, in main >>>> >>> >> func(args) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 82, in subcmd_worker >>>> >>> >> local.service_loop(remote) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1277, in service_loop >>>> >>> >> g3.crawlwrap(oneshot=True) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 599, in crawlwrap >>>> >>> >> self.crawl() >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1555, in crawl >>>> >>> >> self.changelogs_batch_process(changes) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1455, in changelogs_batch_process >>>> >>> >> self.process(batch) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1290, in process >>>> >>> >> self.process_change(change, done, retry) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1195, in process_change >>>> >>> >> failures = self.slave.server.entry_ops(entries) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 233, in __call__ >>>> >>> >> return self.ins(self.meth, *a) >>>> >>> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 215, in __call__ >>>> >>> >> raise res >>>> >>> >> OSError: [Errno 13] Permission denied: '/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb' >>>> >>> >> [2019-11-16 17:29:44.911767] I [repce(agent /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer: terminating on reaching EOF. >>>> >>> >> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc6 >>>> >>> >> [2019-11-16 17:29:45.511806] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > Now after upgrading to 7.0 version from 5.6 we got an error in geo-replication. >>>> >>> > Scenario: >>>> >>> > >>>> >>> > We had a 1x3 replication and distributed volume in each DC. >>>> >>> > Both volumes are started and the geo-replication session is set up between them and the files are synched. Now the geo-replication session is deleted. >>>> >>> > Started to upgrade to 7.0 for each server starting from the slave end. I followed this link --> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ >>>> >>> > After starting the glusterd process created a geo-replication again but ends up in a faulty state. Please find the logs >>>> >>> > >>>> >>> >> [2019-11-28 11:59:12.370255] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.370615] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker brick=/home/sas/gluster/data/code-misc slave_node=192.168.185.84 >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.445581] I [gsyncd(agent /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.448383] I [changelogagent(agent /home/sas/gluster/data/code-misc):72:__init__] ChangelogAgent: Agent listining... >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.453881] I [gsyncd(worker /home/sas/gluster/data/code-misc):311:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.118_code-misc/gsyncd.conf >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.472862] I [resource(worker /home/sas/gluster/data/code-misc):1386:connect_remote] SSH: Initializing SSH connection between master and slave... >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.933346] E [syncdutils(worker /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: connection to peer is broken >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934117] E [syncdutils(worker /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-tKcFQe/5697733f424862ab9d57e019de78aca6.sock sas at 192.168.185.84 /usr/libexec/glusterfs/gsyncd slave code-misc sas at 192.168.185.118::code-misc --master-node 192.168.185.89 --master-node-id a7a9688e-700c-4452-9cd6-e10d6eed5335 --master-brick /home/sas/gluster/data/code-misc --local-node 192.168.185.84 --local-node-id cbafeca3-650b-4c9e-8ea6-2451ea9265dd --slave-timeout 120 --slave-log-level INFO --slave-gluster-log-level INFO --slave-gluster-command-dir /usr/sbin --master-dist-count 3 error=1 >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934436] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> ************************************************************************************************************************** >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934703] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> WARNING: This system is a restricted access system. All activity on this system is subject to monitoring. If information collected reveals possible criminal activity or activity that exceeds privileges, evidence of such activity may be providedto the relevant authorities for further action. >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.934967] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> By continuing past this point, you expressly consent to this monitoring.- ZOHO Corporation >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.935194] E [syncdutils(worker /home/sas/gluster/data/code-misc):809:logerr] Popen: ssh> ************************************************************************************************************************** >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.944369] I [repce(agent /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating on reaching EOF. >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.944722] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase brick=/home/sas/gluster/data/code-misc >>>> >>> >> >>>> >>> >> [2019-11-28 11:59:12.947575] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty >>>> >>> > >>>> >>> > >>>> >>> >>>>