Senén Vidal Blanco
2020-Mar-25 09:14 UTC
[Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX
Hi, I have a problem with the Geo-Replication system. The first synchronization was successful a few days ago. But after a bit of filming I run into an error message preventing the sync from continuing. I summarize a little the data on the configuration: Debian 10 Glusterfs 7.3 Master volume: archivosvao Slave volume: archivossamil volume geo-replication archivosvao samil::archivossamil config access_mount:false allow_network: change_detector:changelog change_interval:5 changelog_archive_format:%Y%m changelog_batch_size:727040 changelog_log_file:/var/log/glusterfs/geo-replication/ archivosvao_samil_archivossamil/changes-${local_id}.log changelog_log_level:INFO checkpoint:0 cli_log_file:/var/log/glusterfs/geo-replication/cli.log cli_log_level:INFO connection_timeout:60 georep_session_working_dir:/var/lib/glusterd/geo-replication/ archivosvao_samil_archivossamil/ gfid_conflict_resolution:true gluster_cli_options: gluster_command:gluster gluster_command_dir:/usr/sbin gluster_log_file:/var/log/glusterfs/geo-replication/ archivosvao_samil_archivossamil/mnt-${local_id}.log gluster_log_level:INFO gluster_logdir:/var/log/glusterfs gluster_params:aux-gfid-mount acl gluster_rundir:/var/run/gluster glusterd_workdir:/var/lib/glusterd gsyncd_miscdir:/var/lib/misc/gluster/gsyncd ignore_deletes:false isolated_slaves: log_file:/var/log/glusterfs/geo-replication/archivosvao_samil_archivossamil/ gsyncd.log log_level:INFO log_rsync_performance:false master_disperse_count:1 master_distribution_count:1 master_replica_count:1 max_rsync_retries:10 meta_volume_mnt:/var/run/gluster/shared_storage pid_file:/var/run/gluster/gsyncd-archivosvao-samil-archivossamil.pid remote_gsyncd: replica_failover_interval:1 rsync_command:rsync rsync_opt_existing:true rsync_opt_ignore_missing_args:true rsync_options: rsync_ssh_options: slave_access_mount:false slave_gluster_command_dir:/usr/sbin slave_gluster_log_file:/var/log/glusterfs/geo-replication-slaves/ archivosvao_samil_archivossamil/mnt-${master_node}-${master_brick_id}.log slave_gluster_log_file_mbr:/var/log/glusterfs/geo-replication-slaves/ archivosvao_samil_archivossamil/mnt-mbr-${master_node}-${master_brick_id}.log slave_gluster_log_level:INFO slave_gluster_params:aux-gfid-mount acl slave_log_file:/var/log/glusterfs/geo-replication-slaves/ archivosvao_samil_archivossamil/gsyncd.log slave_log_level:INFO slave_timeout:120 special_sync_mode: ssh_command:ssh ssh_options:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/ lib/glusterd/geo-replication/secret.pem ssh_options_tar:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i / var/lib/glusterd/geo-replication/tar_ssh.pem ssh_port:22 state_file:/var/lib/glusterd/geo-replication/archivosvao_samil_archivossamil/ monitor.status state_socket_unencoded: stime_xattr_prefix:trusted.glusterfs.c7fa7778- f2e4-48f9-8817-5811c09964d5.8d4c7ef7-35fc-497a-9425-66f4aced159b sync_acls:true sync_jobs:3 sync_method:rsync sync_xattrs:true tar_command:tar use_meta_volume:false use_rsync_xattrs:false working_dir:/var/lib/misc/gluster/gsyncd/archivosvao_samil_archivossamil/ gluster> volume info Volume Name: archivossamil Type: Distribute Volume ID: 8d4c7ef7-35fc-497a-9425-66f4aced159b Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: samil:/brickarchivos/archivos Options Reconfigured: nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet features.read-only: on Volume Name: archivosvao Type: Distribute Volume ID: c7fa7778-f2e4-48f9-8817-5811c09964d5 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: vao:/brickarchivos/archivos Options Reconfigured: nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on Volume Name: home Type: Replicate Volume ID: 74522542-5d7a-4fdd-9cea-76bf1ff27e7d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: samil:/brickhome/home Brick2: vao:/brickhome/home Options Reconfigured: performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet These errors appear in the master logs: ............. [2020-03-25 09:00:12.554226] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=2 return_code=0 duration=0.0483 [2020-03-25 09:00:12.772688] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=3 return_code=0 duration=0.0539 [2020-03-25 09:00:13.112986] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=2 return_code=0 duration=0.0575 [2020-03-25 09:00:13.311976] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=1 return_code=0 duration=0.0379 [2020-03-25 09:00:13.382845] I [master(worker /brickarchivos/archivos): 1227:process_change] _GMaster: Entry ops failed with gfid mismatch count=1 [2020-03-25 09:00:13.385680] E [syncdutils(worker /brickarchivos/archivos): 339:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 332, in main func(args) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker local.service_loop(remote) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 1297, in service_loop g3.crawlwrap(oneshot=True) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 602, in crawlwrap self.crawl() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1592, in crawl self.changelogs_batch_process(changes) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1492, in changelogs_batch_process self.process(batch) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1327, in process self.process_change(change, done, retry) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1230, in process_change self.handle_entry_failures(failures, entries) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 973, in handle_entry_failures failures1, retries, entry_ops1) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 936, in fix_possible_entry_failures pargfid)) FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/ archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf' [2020-03-25 09:00:13.435045] I [repce(agent /brickarchivos/archivos): 96:service_loop] RepceServer: terminating on reaching EOF. [2020-03-25 09:00:14.248754] I [monitor(monitor):280:monitor] Monitor: worker died in startup phase brick=/brickarchivos/archivos [2020-03-25 09:00:16.83872] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2020-03-25 09:00:36.304047] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [2020-03-25 09:00:36.304274] I [monitor(monitor):159:monitor] Monitor: starting gsyncd worker brick=/brickarchivos/archivos slave_node=samil [2020-03-25 09:00:36.391111] I [gsyncd(agent /brickarchivos/archivos): 318:main] <top>: Using session config file path=/var/lib/glusterd/geo- replication/archivosvao_samil_archivossamil/gsyncd.conf [2020-03-25 09:00:36.392865] I [changelogagent(agent /brickarchivos/archivos): 72:__init__] ChangelogAgent: Agent listining... [2020-03-25 09:00:36.399606] I [gsyncd(worker /brickarchivos/archivos): 318:main] <top>: Using session config file path=/var/lib/glusterd/geo- replication/archivosvao_samil_archivossamil/gsyncd.conf [2020-03-25 09:00:36.412956] I [resource(worker /brickarchivos/archivos): 1386:connect_remote] SSH: Initializing SSH connection between master and slave... [2020-03-25 09:00:37.772666] I [resource(worker /brickarchivos/archivos): 1435:connect_remote] SSH: SSH connection between master and slave established. duration=1.3594 [2020-03-25 09:00:37.773320] I [resource(worker /brickarchivos/archivos): 1105:connect] GLUSTER: Mounting gluster volume locally... [2020-03-25 09:00:38.821624] I [resource(worker /brickarchivos/archivos): 1128:connect] GLUSTER: Mounted gluster volume duration=1.0479 [2020-03-25 09:00:38.822003] I [subcmds(worker /brickarchivos/archivos): 84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2020-03-25 09:00:41.797329] I [master(worker /brickarchivos/archivos): 1640:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/ archivosvao_samil_archivossamil/brickarchivos-archivos [2020-03-25 09:00:41.798168] I [resource(worker /brickarchivos/archivos): 1291:service_loop] GLUSTER: Register time time=1585126841 [2020-03-25 09:00:42.143373] I [gsyncdstatus(worker /brickarchivos/archivos): 281:set_active] GeorepStatus: Worker Status Change status=Active [2020-03-25 09:00:42.310175] I [gsyncdstatus(worker /brickarchivos/archivos): 253:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl [2020-03-25 09:00:42.311381] I [master(worker /brickarchivos/archivos): 1554:crawl] _GMaster: starting history crawl turns=1 stime=(1585015849, 0) etime=1585126842 entry_stime=(1585043575, 0) [2020-03-25 09:00:43.347883] I [master(worker /brickarchivos/archivos): 1583:crawl] _GMaster: slave's time stime=(1585015849, 0) [2020-03-25 09:00:43.932979] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=7 return_code=0 duration=0.1022 [2020-03-25 09:00:43.980473] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 duration=0.0467 [2020-03-25 09:00:44.387296] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=5 return_code=0 duration=0.0539 [2020-03-25 09:00:44.424803] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=1 return_code=0 duration=0.0368 [2020-03-25 09:00:44.877503] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=3 num_files=4 return_code=0 duration=0.0431 [2020-03-25 09:00:44.918785] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=3 num_files=3 return_code=0 duration=0.0403 [2020-03-25 09:00:45.20351] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 duration=0.0382 [2020-03-25 09:00:45.55611] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 duration=0.0344 [2020-03-25 09:00:45.90699] I [master(worker /brickarchivos/archivos): 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 duration=0.0341 ............. It seems that the source of the error is the absence of this file: FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/ archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf' When the error appears, it tries to synchronize again and enters a retry loop. I have stopped syncing and trying to resume it the next day. Now the error appears in a different file, but always within the index path of the Gluster [2020-03-23 16:49:20.729115] I [master(worker /brickarchivos/archivos): 1227:process_change] _GMaster: Entry ops failed with gfid mismatch count=1 [2020-03-23 16:49:20.731028] E [syncdutils(worker /brickarchivos/archivos): 339:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 332, in main func(args) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py", line 86, in subcmd_worker local.service_loop(remote) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 1297, in service_loop g3.crawlwrap(oneshot=True) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 602, in crawlwrap self.crawl() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1592, in crawl self.changelogs_batch_process(changes) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1492, in changelogs_batch_process self.process(batch) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1327, in process self.process_change(change, done, retry) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1230, in process_change self.handle_entry_failures(failures, entries) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 973, in handle_entry_failures failures1, retries, entry_ops1) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 936, in fix_possible_entry_failures pargfid)) FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/ archivos/.glusterfs/63/11/63113be6-0774-4719-96a6-619f7777aed2' [2020-03-23 16:49:20.764215] I [repce(agent /brickarchivos/archivos): 96:service_loop] RepceServer: terminating on reaching EOF. I have tried to remove the geo-replication and recreate it, but the problem recurs. I do not delete the slave data since they are more than 2.5 Tb and it would take several days to synchronize again: volume geo-replication archivosvao samil::archivossamil stop volume geo-replication archivosvao samil::archivossamil delete volume set archivosvao geo-replication.indexing off volume geo-replication archivosvao samil::archivossamil create push-pem force volume geo-replication archivosvao samil::archivossamil start But without solution Any help would be appreciated. Thank you.
Sunny Kumar
2020-Mar-25 18:08 UTC
[Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX
Hi Sen?n, By any chance you perform any operation on slave volume; like deleting data directly from slave volume. Also If possible please share geo-rep slave logs. /sunny On Wed, Mar 25, 2020 at 9:15 AM Sen?n Vidal Blanco <senenvidal at sgisoft.com> wrote:> > Hi, > I have a problem with the Geo-Replication system. > The first synchronization was successful a few days ago. But after a bit of > filming I run into an error message preventing the sync from continuing. > I summarize a little the data on the configuration: > > Debian 10 > Glusterfs 7.3 > Master volume: archivosvao > Slave volume: archivossamil > > volume geo-replication archivosvao samil::archivossamil config > access_mount:false > allow_network: > change_detector:changelog > change_interval:5 > changelog_archive_format:%Y%m > changelog_batch_size:727040 > changelog_log_file:/var/log/glusterfs/geo-replication/ > archivosvao_samil_archivossamil/changes-${local_id}.log > changelog_log_level:INFO > checkpoint:0 > cli_log_file:/var/log/glusterfs/geo-replication/cli.log > cli_log_level:INFO > connection_timeout:60 > georep_session_working_dir:/var/lib/glusterd/geo-replication/ > archivosvao_samil_archivossamil/ > gfid_conflict_resolution:true > gluster_cli_options: > gluster_command:gluster > gluster_command_dir:/usr/sbin > gluster_log_file:/var/log/glusterfs/geo-replication/ > archivosvao_samil_archivossamil/mnt-${local_id}.log > gluster_log_level:INFO > gluster_logdir:/var/log/glusterfs > gluster_params:aux-gfid-mount acl > gluster_rundir:/var/run/gluster > glusterd_workdir:/var/lib/glusterd > gsyncd_miscdir:/var/lib/misc/gluster/gsyncd > ignore_deletes:false > isolated_slaves: > log_file:/var/log/glusterfs/geo-replication/archivosvao_samil_archivossamil/ > gsyncd.log > log_level:INFO > log_rsync_performance:false > master_disperse_count:1 > master_distribution_count:1 > master_replica_count:1 > max_rsync_retries:10 > meta_volume_mnt:/var/run/gluster/shared_storage > pid_file:/var/run/gluster/gsyncd-archivosvao-samil-archivossamil.pid > remote_gsyncd: > replica_failover_interval:1 > rsync_command:rsync > rsync_opt_existing:true > rsync_opt_ignore_missing_args:true > rsync_options: > rsync_ssh_options: > slave_access_mount:false > slave_gluster_command_dir:/usr/sbin > slave_gluster_log_file:/var/log/glusterfs/geo-replication-slaves/ > archivosvao_samil_archivossamil/mnt-${master_node}-${master_brick_id}.log > slave_gluster_log_file_mbr:/var/log/glusterfs/geo-replication-slaves/ > archivosvao_samil_archivossamil/mnt-mbr-${master_node}-${master_brick_id}.log > slave_gluster_log_level:INFO > slave_gluster_params:aux-gfid-mount acl > slave_log_file:/var/log/glusterfs/geo-replication-slaves/ > archivosvao_samil_archivossamil/gsyncd.log > slave_log_level:INFO > slave_timeout:120 > special_sync_mode: > ssh_command:ssh > ssh_options:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/ > lib/glusterd/geo-replication/secret.pem > ssh_options_tar:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i / > var/lib/glusterd/geo-replication/tar_ssh.pem > ssh_port:22 > state_file:/var/lib/glusterd/geo-replication/archivosvao_samil_archivossamil/ > monitor.status > state_socket_unencoded: > stime_xattr_prefix:trusted.glusterfs.c7fa7778- > f2e4-48f9-8817-5811c09964d5.8d4c7ef7-35fc-497a-9425-66f4aced159b > sync_acls:true > sync_jobs:3 > sync_method:rsync > sync_xattrs:true > tar_command:tar > use_meta_volume:false > use_rsync_xattrs:false > working_dir:/var/lib/misc/gluster/gsyncd/archivosvao_samil_archivossamil/ > > > gluster> volume info > > Volume Name: archivossamil > Type: Distribute > Volume ID: 8d4c7ef7-35fc-497a-9425-66f4aced159b > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: samil:/brickarchivos/archivos > Options Reconfigured: > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > features.read-only: on > > Volume Name: archivosvao > Type: Distribute > Volume ID: c7fa7778-f2e4-48f9-8817-5811c09964d5 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: vao:/brickarchivos/archivos > Options Reconfigured: > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > geo-replication.indexing: on > geo-replication.ignore-pid-check: on > changelog.changelog: on > > Volume Name: home > Type: Replicate > Volume ID: 74522542-5d7a-4fdd-9cea-76bf1ff27e7d > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: samil:/brickhome/home > Brick2: vao:/brickhome/home > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > > > These errors appear in the master logs: > > > > ............. > > [2020-03-25 09:00:12.554226] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=2 return_code=0 > duration=0.0483 > [2020-03-25 09:00:12.772688] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=3 return_code=0 > duration=0.0539 > [2020-03-25 09:00:13.112986] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=2 return_code=0 > duration=0.0575 > [2020-03-25 09:00:13.311976] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=1 return_code=0 > duration=0.0379 > [2020-03-25 09:00:13.382845] I [master(worker /brickarchivos/archivos): > 1227:process_change] _GMaster: Entry ops failed with gfid mismatch > count=1 > [2020-03-25 09:00:13.385680] E [syncdutils(worker /brickarchivos/archivos): > 339:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line > 332, in main func(args) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py", > line 86, in subcmd_worker local.service_loop(remote) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", > line 1297, in service_loop g3.crawlwrap(oneshot=True) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 602, in crawlwrap self.crawl() > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1592, in crawl self.changelogs_batch_process(changes) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1492, in changelogs_batch_process self.process(batch) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1327, in process self.process_change(change, done, retry) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1230, in process_change self.handle_entry_failures(failures, entries) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 973, in handle_entry_failures failures1, retries, entry_ops1) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 936, in fix_possible_entry_failures pargfid)) > FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/ > archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf' > [2020-03-25 09:00:13.435045] I [repce(agent /brickarchivos/archivos): > 96:service_loop] RepceServer: terminating on reaching EOF. > [2020-03-25 09:00:14.248754] I [monitor(monitor):280:monitor] Monitor: worker > died in startup phase brick=/brickarchivos/archivos > [2020-03-25 09:00:16.83872] I [gsyncdstatus(monitor):248:set_worker_status] > GeorepStatus: Worker Status Change status=Faulty > [2020-03-25 09:00:36.304047] I [gsyncdstatus(monitor):248:set_worker_status] > GeorepStatus: Worker Status Change status=Initializing... > [2020-03-25 09:00:36.304274] I [monitor(monitor):159:monitor] Monitor: > starting gsyncd worker brick=/brickarchivos/archivos slave_node=samil > [2020-03-25 09:00:36.391111] I [gsyncd(agent /brickarchivos/archivos): > 318:main] <top>: Using session config file path=/var/lib/glusterd/geo- > replication/archivosvao_samil_archivossamil/gsyncd.conf > [2020-03-25 09:00:36.392865] I [changelogagent(agent /brickarchivos/archivos): > 72:__init__] ChangelogAgent: Agent listining... > [2020-03-25 09:00:36.399606] I [gsyncd(worker /brickarchivos/archivos): > 318:main] <top>: Using session config file path=/var/lib/glusterd/geo- > replication/archivosvao_samil_archivossamil/gsyncd.conf > [2020-03-25 09:00:36.412956] I [resource(worker /brickarchivos/archivos): > 1386:connect_remote] SSH: Initializing SSH connection between master and > slave... > [2020-03-25 09:00:37.772666] I [resource(worker /brickarchivos/archivos): > 1435:connect_remote] SSH: SSH connection between master and slave established. > duration=1.3594 > [2020-03-25 09:00:37.773320] I [resource(worker /brickarchivos/archivos): > 1105:connect] GLUSTER: Mounting gluster volume locally... > [2020-03-25 09:00:38.821624] I [resource(worker /brickarchivos/archivos): > 1128:connect] GLUSTER: Mounted gluster volume duration=1.0479 > [2020-03-25 09:00:38.822003] I [subcmds(worker /brickarchivos/archivos): > 84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to > monitor > [2020-03-25 09:00:41.797329] I [master(worker /brickarchivos/archivos): > 1640:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/ > archivosvao_samil_archivossamil/brickarchivos-archivos > [2020-03-25 09:00:41.798168] I [resource(worker /brickarchivos/archivos): > 1291:service_loop] GLUSTER: Register time time=1585126841 > [2020-03-25 09:00:42.143373] I [gsyncdstatus(worker /brickarchivos/archivos): > 281:set_active] GeorepStatus: Worker Status Change status=Active > [2020-03-25 09:00:42.310175] I [gsyncdstatus(worker /brickarchivos/archivos): > 253:set_worker_crawl_status] GeorepStatus: Crawl Status Change > status=History Crawl > [2020-03-25 09:00:42.311381] I [master(worker /brickarchivos/archivos): > 1554:crawl] _GMaster: starting history crawl turns=1 stime=(1585015849, 0) > etime=1585126842 entry_stime=(1585043575, 0) > [2020-03-25 09:00:43.347883] I [master(worker /brickarchivos/archivos): > 1583:crawl] _GMaster: slave's time stime=(1585015849, 0) > [2020-03-25 09:00:43.932979] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=7 return_code=0 > duration=0.1022 > [2020-03-25 09:00:43.980473] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 > duration=0.0467 > [2020-03-25 09:00:44.387296] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=5 return_code=0 > duration=0.0539 > [2020-03-25 09:00:44.424803] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=2 num_files=1 return_code=0 > duration=0.0368 > [2020-03-25 09:00:44.877503] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=3 num_files=4 return_code=0 > duration=0.0431 > [2020-03-25 09:00:44.918785] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=3 num_files=3 return_code=0 > duration=0.0403 > [2020-03-25 09:00:45.20351] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 > duration=0.0382 > [2020-03-25 09:00:45.55611] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 > duration=0.0344 > [2020-03-25 09:00:45.90699] I [master(worker /brickarchivos/archivos): > 1991:syncjob] Syncer: Sync Time Taken job=1 num_files=1 return_code=0 > duration=0.0341 > > ............. > > > It seems that the source of the error is the absence of this file: > > FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/ > archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf' > > > When the error appears, it tries to synchronize again and enters a retry loop. > > I have stopped syncing and trying to resume it the next day. Now the error > appears in a different file, but always within the index path of the Gluster > > > [2020-03-23 16:49:20.729115] I [master(worker /brickarchivos/archivos): > 1227:process_change] _GMaster: Entry ops failed with gfid mismatch > count=1 > [2020-03-23 16:49:20.731028] E [syncdutils(worker /brickarchivos/archivos): > 339:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line > 332, in main func(args) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py", > line 86, in subcmd_worker local.service_loop(remote) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", > line 1297, in service_loop g3.crawlwrap(oneshot=True) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 602, in crawlwrap self.crawl() > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1592, in crawl self.changelogs_batch_process(changes) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1492, in changelogs_batch_process self.process(batch) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1327, in process self.process_change(change, done, retry) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 1230, in process_change self.handle_entry_failures(failures, entries) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 973, in handle_entry_failures failures1, retries, entry_ops1) > File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line > 936, in fix_possible_entry_failures > pargfid)) > FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/ > archivos/.glusterfs/63/11/63113be6-0774-4719-96a6-619f7777aed2' > [2020-03-23 16:49:20.764215] I [repce(agent /brickarchivos/archivos): > 96:service_loop] RepceServer: terminating on reaching EOF. > > I have tried to remove the geo-replication and recreate it, but the problem > recurs. > I do not delete the slave data since they are more than 2.5 Tb and it would > take several days to synchronize again: > > volume geo-replication archivosvao samil::archivossamil stop > volume geo-replication archivosvao samil::archivossamil delete > volume set archivosvao geo-replication.indexing off > volume geo-replication archivosvao samil::archivossamil create push-pem force > volume geo-replication archivosvao samil::archivossamil start > > But without solution > > Any help would be appreciated. > Thank you. > > > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >