thr3ads.net - Gluster users - [Gluster-users] Geo_replication to Faulty [Nov 2019]

If this information is useful, please help other people find it:
Share via:
Kotresh Hiremath Ravishankar
2019-Nov-19 05:13 UTC
[Gluster-users] Geo_replication to Faulty

This issue has been recently fixed with the following patch and should be
available in latest gluster-6.x

https://review.gluster.org/#/c/glusterfs/+/23570/

On Tue, Nov 19, 2019 at 10:26 AM deepu srinivasan <sdeepugd at gmail.com>
wrote:
>
> Hi Aravinda
> *The below logs are from master end:*
>
> [2019-11-16 17:29:43.536881] I [gsyncdstatus(worker
> /home/sas/gluster/data/code-misc6):281:set_active] GeorepStatus: Worker
> Status Change       status=Active
> [2019-11-16 17:29:43.629620] I [gsyncdstatus(worker
> /home/sas/gluster/data/code-misc6):253:set_worker_crawl_status]
> GeorepStatus: Crawl Status Change   status=History Crawl
> [2019-11-16 17:29:43.630328] I [master(worker
> /home/sas/gluster/data/code-misc6):1517:crawl] _GMaster: starting history
> crawl   turns=1 stime=(1573924576, 0)   entry_stime=(1573924576, 0)
> etime=1573925383
> [2019-11-16 17:29:44.636725] I [master(worker
> /home/sas/gluster/data/code-misc6):1546:crawl] _GMaster: slave's time
> stime=(1573924576, 0)
> [2019-11-16 17:29:44.778966] I [master(worker
> /home/sas/gluster/data/code-misc6):898:fix_possible_entry_failures]
> _GMaster: Fixing ENOENT error in slave. Parent does not exist on master.
> Safe to ignore, take out entry       retry_count=1   entry=({'uid':
0,
> 'gfid': 'c02519e0-0ead-4fe8-902b-dcae72ef83a3',
'gid': 0, 'mode': 33188,
> 'entry':
'.gfid/d60aa0d5-4fdf-4721-97dc-9e3e50995dab/368307802', 'op':
> 'CREATE'}, 2, {'slave_isdir': False,
'gfid_mismatch': False, 'slave_name':
> None, 'slave_gfid': None, 'name_mismatch': False,
'dst': False})
> [2019-11-16 17:29:44.779306] I [master(worker
> /home/sas/gluster/data/code-misc6):942:handle_entry_failures] _GMaster:
> Sucessfully fixed entry ops with gfid mismatch    retry_count=1
> [2019-11-16 17:29:44.779516] I [master(worker
> /home/sas/gluster/data/code-misc6):1194:process_change] _GMaster: Retry
> original entries. count = 1
> [2019-11-16 17:29:44.879321] E [repce(worker
> /home/sas/gluster/data/code-misc6):214:__call__] RepceClient: call failed
>  call=151945:140353273153344:1573925384.78       method=entry_ops
>  error=OSError
> [2019-11-16 17:29:44.879750] E [syncdutils(worker
> /home/sas/gluster/data/code-misc6):338:log_raise_exception] <top>:
FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
322, in
> main
>     func(args)
>   File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py",
line 82, in
> subcmd_worker
>     local.service_loop(remote)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py",
line 1277,
> in service_loop
>     g3.crawlwrap(oneshot=True)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
599, in
> crawlwrap
>     self.crawl()
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1555, in
> crawl
>     self.changelogs_batch_process(changes)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1455, in
> changelogs_batch_process
>     self.process(batch)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1290, in
> process
>     self.process_change(change, done, retry)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1195, in
> process_change
>     failures = self.slave.server.entry_ops(entries)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
233, in
> __call__
>     return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
215, in
> __call__
>     raise res
> OSError: [Errno 13] Permission denied:
>
'/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
> [2019-11-16 17:29:44.911767] I [repce(agent
> /home/sas/gluster/data/code-misc6):97:service_loop] RepceServer:
> terminating on reaching EOF.
> [2019-11-16 17:29:45.509344] I [monitor(monitor):278:monitor] Monitor:
> worker died in startup phase     brick=/home/sas/gluster/data/code-misc6
> [2019-11-16 17:29:45.511806] I
> [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
> Change status=Faulty
>
>
>
> *The below logs are from the slave end.*
>
> [2019-11-16 17:24:42.281599] I [resource(slave
> 192.168.185.106/home/sas/gluster/data/code-misc6):580:entry_ops
>
<http://192.168.185.106/home/sas/gluster/data/code-misc6%29:580:entry_ops>]
> <top>: Special case: rename on mkdir
>  gfid=6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb
>
entry='.gfid/a8921d78-a078-46d3-aca5-8b078eb62cac/8878061b-d5b3-47a6-b01c-8310fee39b20'
> [2019-11-16 17:24:42.370582] E [repce(slave
> 192.168.185.106/home/sas/gluster/data/code-misc6):122:worker
>
<http://192.168.185.106/home/sas/gluster/data/code-misc6%29:122:worker>]
> <top>: call failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
118, in
> worker
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py",
line 581,
> in entry_ops
>     src_entry = get_slv_dir_path(slv_host, slv_volume, gfid)
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 690,
> in get_slv_dir_path
>     [ENOENT], [ESTALE])
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 546,
> in errno_wrap
>     return call(*arg)
> OSError: [Errno 13] Permission denied:
>
'/home/sas/gluster/data/code-misc6/.glusterfs/6a/90/6a9008b1-a4aa-4c30-9ae7-92a33e05d0bb'
> [2019-11-16 17:24:42.400402] I [repce(slave
> 192.168.185.106/home/sas/gluster/data/code-misc6):97:service_loop
>
<http://192.168.185.106/home/sas/gluster/data/code-misc6%29:97:service_loop>]
> RepceServer: terminating on reaching EOF.
> [2019-11-16 17:24:53.403165] W [gsyncd(slave
> 192.168.185.106/home/sas/gluster/data/code-misc6):304:main
>
<http://192.168.185.106/home/sas/gluster/data/code-misc6%29:304:main>]
> <top>: Session config file not exists, using the default config
> 
path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.con
>
>
> On Sat, Nov 16, 2019, 9:26 PM Aravinda Vishwanathapura Krishna Murthy <
> avishwan at redhat.com> wrote:
>
>> Hi Deepu,
>>
>> Please share the reason for Faulty from Geo-rep logs of respective
>> master node.
>>
>>
>> On Sat, Nov 16, 2019 at 1:01 AM deepu srinivasan <sdeepugd at
gmail.com>
>> wrote:
>>
>>> Hi Users/Development Team
>>> We have set up a Geo-replication session with non-root in slave
setup in
>>> our DC.
>>> It was working well with Active Status and Changelogcrawl.
>>>
>>> We were mounting the master node and the file is being written in
it.
>>> We were running some process as the root user so the process wrote
some
>>> file and folder with root permission.
>>> After stopping the geo-replication and starting the process the
session
>>> went to the faulty state.
>>> How to recover?
>>>
>>
>>
>> --
>> regards
>> Aravinda VK
>>
>
-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191119/ac959e53/attachment.html>
Gluster users - Nov 2019 - Geo_replication to Faulty

[Gluster-users] Geo_replication to Faulty