thr3ads.net - Gluster users - [Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Senén Vidal Blanco

2020-Mar-25 09:14 UTC

[Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX

Hi,
I have a problem with the Geo-Replication system.
The first synchronization was successful a few days ago. But after a bit of 
filming I run into an error message preventing the sync from continuing.
I summarize a little the data on the configuration:

Debian 10
Glusterfs 7.3
Master volume: archivosvao
Slave volume: archivossamil

volume geo-replication archivosvao samil::archivossamil config
access_mount:false
allow_network:
change_detector:changelog
change_interval:5
changelog_archive_format:%Y%m
changelog_batch_size:727040
changelog_log_file:/var/log/glusterfs/geo-replication/
archivosvao_samil_archivossamil/changes-${local_id}.log
changelog_log_level:INFO
checkpoint:0
cli_log_file:/var/log/glusterfs/geo-replication/cli.log
cli_log_level:INFO
connection_timeout:60
georep_session_working_dir:/var/lib/glusterd/geo-replication/
archivosvao_samil_archivossamil/
gfid_conflict_resolution:true
gluster_cli_options:
gluster_command:gluster
gluster_command_dir:/usr/sbin
gluster_log_file:/var/log/glusterfs/geo-replication/
archivosvao_samil_archivossamil/mnt-${local_id}.log
gluster_log_level:INFO
gluster_logdir:/var/log/glusterfs
gluster_params:aux-gfid-mount acl
gluster_rundir:/var/run/gluster
glusterd_workdir:/var/lib/glusterd
gsyncd_miscdir:/var/lib/misc/gluster/gsyncd
ignore_deletes:false
isolated_slaves:
log_file:/var/log/glusterfs/geo-replication/archivosvao_samil_archivossamil/
gsyncd.log
log_level:INFO
log_rsync_performance:false
master_disperse_count:1
master_distribution_count:1
master_replica_count:1
max_rsync_retries:10
meta_volume_mnt:/var/run/gluster/shared_storage
pid_file:/var/run/gluster/gsyncd-archivosvao-samil-archivossamil.pid
remote_gsyncd:
replica_failover_interval:1
rsync_command:rsync
rsync_opt_existing:true
rsync_opt_ignore_missing_args:true
rsync_options:
rsync_ssh_options:
slave_access_mount:false
slave_gluster_command_dir:/usr/sbin
slave_gluster_log_file:/var/log/glusterfs/geo-replication-slaves/
archivosvao_samil_archivossamil/mnt-${master_node}-${master_brick_id}.log
slave_gluster_log_file_mbr:/var/log/glusterfs/geo-replication-slaves/
archivosvao_samil_archivossamil/mnt-mbr-${master_node}-${master_brick_id}.log
slave_gluster_log_level:INFO
slave_gluster_params:aux-gfid-mount acl
slave_log_file:/var/log/glusterfs/geo-replication-slaves/
archivosvao_samil_archivossamil/gsyncd.log
slave_log_level:INFO
slave_timeout:120
special_sync_mode:
ssh_command:ssh
ssh_options:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/
lib/glusterd/geo-replication/secret.pem
ssh_options_tar:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /
var/lib/glusterd/geo-replication/tar_ssh.pem
ssh_port:22
state_file:/var/lib/glusterd/geo-replication/archivosvao_samil_archivossamil/
monitor.status
state_socket_unencoded:
stime_xattr_prefix:trusted.glusterfs.c7fa7778-
f2e4-48f9-8817-5811c09964d5.8d4c7ef7-35fc-497a-9425-66f4aced159b
sync_acls:true
sync_jobs:3
sync_method:rsync
sync_xattrs:true
tar_command:tar
use_meta_volume:false
use_rsync_xattrs:false
working_dir:/var/lib/misc/gluster/gsyncd/archivosvao_samil_archivossamil/


gluster> volume info
 
Volume Name: archivossamil
Type: Distribute
Volume ID: 8d4c7ef7-35fc-497a-9425-66f4aced159b
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: samil:/brickarchivos/archivos
Options Reconfigured:
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
features.read-only: on
 
Volume Name: archivosvao
Type: Distribute
Volume ID: c7fa7778-f2e4-48f9-8817-5811c09964d5
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: vao:/brickarchivos/archivos
Options Reconfigured:
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
 
Volume Name: home
Type: Replicate
Volume ID: 74522542-5d7a-4fdd-9cea-76bf1ff27e7d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: samil:/brickhome/home
Brick2: vao:/brickhome/home
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet


These errors appear in the master logs:



.............

[2020-03-25 09:00:12.554226] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=2     return_code=0
duration=0.0483
[2020-03-25 09:00:12.772688] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=3     return_code=0
duration=0.0539
[2020-03-25 09:00:13.112986] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=2     return_code=0
duration=0.0575
[2020-03-25 09:00:13.311976] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=1     return_code=0
duration=0.0379
[2020-03-25 09:00:13.382845] I [master(worker /brickarchivos/archivos):
1227:process_change] _GMaster: Entry ops failed with gfid mismatch       
count=1
[2020-03-25 09:00:13.385680] E [syncdutils(worker /brickarchivos/archivos):
339:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py",
line
332, in main func(args)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py",
line 86, in subcmd_worker    local.service_loop(remote)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py",
line 1297, in service_loop    g3.crawlwrap(oneshot=True)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
602, in crawlwrap    self.crawl()
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1592, in crawl    self.changelogs_batch_process(changes)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1492, in changelogs_batch_process    self.process(batch)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1327, in process    self.process_change(change, done, retry)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1230, in process_change    self.handle_entry_failures(failures, entries)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
973, in handle_entry_failures    failures1, retries, entry_ops1)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
936, in fix_possible_entry_failures    pargfid))
FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/
archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf'
[2020-03-25 09:00:13.435045] I [repce(agent /brickarchivos/archivos):
96:service_loop] RepceServer: terminating on reaching EOF.
[2020-03-25 09:00:14.248754] I [monitor(monitor):280:monitor] Monitor: worker 
died in startup phase     brick=/brickarchivos/archivos
[2020-03-25 09:00:16.83872] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change  status=Faulty
[2020-03-25 09:00:36.304047] I [gsyncdstatus(monitor):248:set_worker_status] 
GeorepStatus: Worker Status Change status=Initializing...
[2020-03-25 09:00:36.304274] I [monitor(monitor):159:monitor] Monitor: 
starting gsyncd worker   brick=/brickarchivos/archivos   slave_node=samil
[2020-03-25 09:00:36.391111] I [gsyncd(agent /brickarchivos/archivos):
318:main] <top>: Using session config file       
path=/var/lib/glusterd/geo-
replication/archivosvao_samil_archivossamil/gsyncd.conf
[2020-03-25 09:00:36.392865] I [changelogagent(agent /brickarchivos/archivos):
72:__init__] ChangelogAgent: Agent listining...
[2020-03-25 09:00:36.399606] I [gsyncd(worker /brickarchivos/archivos):
318:main] <top>: Using session config file      
path=/var/lib/glusterd/geo-
replication/archivosvao_samil_archivossamil/gsyncd.conf
[2020-03-25 09:00:36.412956] I [resource(worker /brickarchivos/archivos):
1386:connect_remote] SSH: Initializing SSH connection between master and 
slave...
[2020-03-25 09:00:37.772666] I [resource(worker /brickarchivos/archivos):
1435:connect_remote] SSH: SSH connection between master and slave established. 
duration=1.3594
[2020-03-25 09:00:37.773320] I [resource(worker /brickarchivos/archivos):
1105:connect] GLUSTER: Mounting gluster volume locally...
[2020-03-25 09:00:38.821624] I [resource(worker /brickarchivos/archivos):
1128:connect] GLUSTER: Mounted gluster volume  duration=1.0479
[2020-03-25 09:00:38.822003] I [subcmds(worker /brickarchivos/archivos):
84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to 
monitor
[2020-03-25 09:00:41.797329] I [master(worker /brickarchivos/archivos):
1640:register] _GMaster: Working dir     path=/var/lib/misc/gluster/gsyncd/
archivosvao_samil_archivossamil/brickarchivos-archivos
[2020-03-25 09:00:41.798168] I [resource(worker /brickarchivos/archivos):
1291:service_loop] GLUSTER: Register time      time=1585126841
[2020-03-25 09:00:42.143373] I [gsyncdstatus(worker /brickarchivos/archivos):
281:set_active] GeorepStatus: Worker Status Change status=Active
[2020-03-25 09:00:42.310175] I [gsyncdstatus(worker /brickarchivos/archivos):
253:set_worker_crawl_status] GeorepStatus: Crawl Status Change     
status=History Crawl
[2020-03-25 09:00:42.311381] I [master(worker /brickarchivos/archivos):
1554:crawl] _GMaster: starting history crawl     turns=1 stime=(1585015849, 0)
etime=1585126842        entry_stime=(1585043575, 0)
[2020-03-25 09:00:43.347883] I [master(worker /brickarchivos/archivos):
1583:crawl] _GMaster: slave's time       stime=(1585015849, 0)
[2020-03-25 09:00:43.932979] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=7     return_code=0
duration=0.1022
[2020-03-25 09:00:43.980473] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=1     return_code=0
duration=0.0467
[2020-03-25 09:00:44.387296] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=5     return_code=0
duration=0.0539
[2020-03-25 09:00:44.424803] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=1     return_code=0
duration=0.0368
[2020-03-25 09:00:44.877503] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=3   num_files=4     return_code=0
duration=0.0431
[2020-03-25 09:00:44.918785] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken    job=3   num_files=3     return_code=0
duration=0.0403
[2020-03-25 09:00:45.20351] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken     job=1   num_files=1     return_code=0
duration=0.0382
[2020-03-25 09:00:45.55611] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken     job=1   num_files=1     return_code=0
duration=0.0344
[2020-03-25 09:00:45.90699] I [master(worker /brickarchivos/archivos):
1991:syncjob] Syncer: Sync Time Taken     job=1   num_files=1     return_code=0
duration=0.0341

.............


It seems that the source of the error is the absence of this file:

FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/
archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf'


When the error appears, it tries to synchronize again and enters a retry loop.

I have stopped syncing and trying to resume it the next day. Now the error 
appears in a different file, but always within the index path of the Gluster


[2020-03-23 16:49:20.729115] I [master(worker /brickarchivos/archivos):
1227:process_change] _GMaster: Entry ops failed with gfid mismatch       
count=1
[2020-03-23 16:49:20.731028] E [syncdutils(worker /brickarchivos/archivos):
339:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py",
line
332, in main    func(args)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py",
line 86, in subcmd_worker    local.service_loop(remote)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py",
line 1297, in service_loop    g3.crawlwrap(oneshot=True)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
602, in crawlwrap    self.crawl()
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1592, in crawl    self.changelogs_batch_process(changes)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1492, in changelogs_batch_process    self.process(batch)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1327, in process    self.process_change(change, done, retry)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
1230, in process_change    self.handle_entry_failures(failures, entries)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
973, in handle_entry_failures    failures1, retries, entry_ops1)
  File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
936, in fix_possible_entry_failures
    pargfid))
FileNotFoundError: [Errno 2] No such file or directory: '/brickarchivos/
archivos/.glusterfs/63/11/63113be6-0774-4719-96a6-619f7777aed2'
[2020-03-23 16:49:20.764215] I [repce(agent /brickarchivos/archivos):
96:service_loop] RepceServer: terminating on reaching EOF.

I have tried to remove the geo-replication and recreate it, but the problem 
recurs.
I do not delete the slave data since they are more than 2.5 Tb and it would 
take several days to synchronize again:

volume geo-replication archivosvao samil::archivossamil stop
volume geo-replication archivosvao samil::archivossamil delete
volume set archivosvao geo-replication.indexing off
volume geo-replication archivosvao samil::archivossamil create push-pem force
volume geo-replication archivosvao samil::archivossamil start

But without solution

Any help would be appreciated.
Thank you.

Sunny Kumar

2020-Mar-25 18:08 UTC

head link

[Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX

Hi Sen?n,

By any chance you perform any operation   on slave volume; like
deleting data directly from slave volume.

Also If possible please share geo-rep slave logs.

/sunny

On Wed, Mar 25, 2020 at 9:15 AM Sen?n Vidal Blanco
<senenvidal at sgisoft.com> wrote:>
> Hi,
> I have a problem with the Geo-Replication system.
> The first synchronization was successful a few days ago. But after a bit of
> filming I run into an error message preventing the sync from continuing.
> I summarize a little the data on the configuration:
>
> Debian 10
> Glusterfs 7.3
> Master volume: archivosvao
> Slave volume: archivossamil
>
> volume geo-replication archivosvao samil::archivossamil config
> access_mount:false
> allow_network:
> change_detector:changelog
> change_interval:5
> changelog_archive_format:%Y%m
> changelog_batch_size:727040
> changelog_log_file:/var/log/glusterfs/geo-replication/
> archivosvao_samil_archivossamil/changes-${local_id}.log
> changelog_log_level:INFO
> checkpoint:0
> cli_log_file:/var/log/glusterfs/geo-replication/cli.log
> cli_log_level:INFO
> connection_timeout:60
> georep_session_working_dir:/var/lib/glusterd/geo-replication/
> archivosvao_samil_archivossamil/
> gfid_conflict_resolution:true
> gluster_cli_options:
> gluster_command:gluster
> gluster_command_dir:/usr/sbin
> gluster_log_file:/var/log/glusterfs/geo-replication/
> archivosvao_samil_archivossamil/mnt-${local_id}.log
> gluster_log_level:INFO
> gluster_logdir:/var/log/glusterfs
> gluster_params:aux-gfid-mount acl
> gluster_rundir:/var/run/gluster
> glusterd_workdir:/var/lib/glusterd
> gsyncd_miscdir:/var/lib/misc/gluster/gsyncd
> ignore_deletes:false
> isolated_slaves:
>
log_file:/var/log/glusterfs/geo-replication/archivosvao_samil_archivossamil/
> gsyncd.log
> log_level:INFO
> log_rsync_performance:false
> master_disperse_count:1
> master_distribution_count:1
> master_replica_count:1
> max_rsync_retries:10
> meta_volume_mnt:/var/run/gluster/shared_storage
> pid_file:/var/run/gluster/gsyncd-archivosvao-samil-archivossamil.pid
> remote_gsyncd:
> replica_failover_interval:1
> rsync_command:rsync
> rsync_opt_existing:true
> rsync_opt_ignore_missing_args:true
> rsync_options:
> rsync_ssh_options:
> slave_access_mount:false
> slave_gluster_command_dir:/usr/sbin
> slave_gluster_log_file:/var/log/glusterfs/geo-replication-slaves/
> archivosvao_samil_archivossamil/mnt-${master_node}-${master_brick_id}.log
> slave_gluster_log_file_mbr:/var/log/glusterfs/geo-replication-slaves/
>
archivosvao_samil_archivossamil/mnt-mbr-${master_node}-${master_brick_id}.log
> slave_gluster_log_level:INFO
> slave_gluster_params:aux-gfid-mount acl
> slave_log_file:/var/log/glusterfs/geo-replication-slaves/
> archivosvao_samil_archivossamil/gsyncd.log
> slave_log_level:INFO
> slave_timeout:120
> special_sync_mode:
> ssh_command:ssh
> ssh_options:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/
> lib/glusterd/geo-replication/secret.pem
> ssh_options_tar:-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /
> var/lib/glusterd/geo-replication/tar_ssh.pem
> ssh_port:22
>
state_file:/var/lib/glusterd/geo-replication/archivosvao_samil_archivossamil/
> monitor.status
> state_socket_unencoded:
> stime_xattr_prefix:trusted.glusterfs.c7fa7778-
> f2e4-48f9-8817-5811c09964d5.8d4c7ef7-35fc-497a-9425-66f4aced159b
> sync_acls:true
> sync_jobs:3
> sync_method:rsync
> sync_xattrs:true
> tar_command:tar
> use_meta_volume:false
> use_rsync_xattrs:false
> working_dir:/var/lib/misc/gluster/gsyncd/archivosvao_samil_archivossamil/
>
>
> gluster> volume info
>
> Volume Name: archivossamil
> Type: Distribute
> Volume ID: 8d4c7ef7-35fc-497a-9425-66f4aced159b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: samil:/brickarchivos/archivos
> Options Reconfigured:
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> features.read-only: on
>
> Volume Name: archivosvao
> Type: Distribute
> Volume ID: c7fa7778-f2e4-48f9-8817-5811c09964d5
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: vao:/brickarchivos/archivos
> Options Reconfigured:
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> geo-replication.indexing: on
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
>
> Volume Name: home
> Type: Replicate
> Volume ID: 74522542-5d7a-4fdd-9cea-76bf1ff27e7d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: samil:/brickhome/home
> Brick2: vao:/brickhome/home
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
>
>
> These errors appear in the master logs:
>
>
>
> .............
>
> [2020-03-25 09:00:12.554226] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=2    
return_code=0
> duration=0.0483
> [2020-03-25 09:00:12.772688] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=3    
return_code=0
> duration=0.0539
> [2020-03-25 09:00:13.112986] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=2    
return_code=0
> duration=0.0575
> [2020-03-25 09:00:13.311976] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=1    
return_code=0
> duration=0.0379
> [2020-03-25 09:00:13.382845] I [master(worker /brickarchivos/archivos):
> 1227:process_change] _GMaster: Entry ops failed with gfid mismatch
> count=1
> [2020-03-25 09:00:13.385680] E [syncdutils(worker /brickarchivos/archivos):
> 339:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py",
line
> 332, in main func(args)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py",
> line 86, in subcmd_worker    local.service_loop(remote)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py",
> line 1297, in service_loop    g3.crawlwrap(oneshot=True)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 602, in crawlwrap    self.crawl()
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1592, in crawl    self.changelogs_batch_process(changes)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1492, in changelogs_batch_process    self.process(batch)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1327, in process    self.process_change(change, done, retry)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1230, in process_change    self.handle_entry_failures(failures, entries)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 973, in handle_entry_failures    failures1, retries, entry_ops1)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 936, in fix_possible_entry_failures    pargfid))
> FileNotFoundError: [Errno 2] No such file or directory:
'/brickarchivos/
> archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf'
> [2020-03-25 09:00:13.435045] I [repce(agent /brickarchivos/archivos):
> 96:service_loop] RepceServer: terminating on reaching EOF.
> [2020-03-25 09:00:14.248754] I [monitor(monitor):280:monitor] Monitor:
worker
> died in startup phase     brick=/brickarchivos/archivos
> [2020-03-25 09:00:16.83872] I [gsyncdstatus(monitor):248:set_worker_status]
> GeorepStatus: Worker Status Change  status=Faulty
> [2020-03-25 09:00:36.304047] I
[gsyncdstatus(monitor):248:set_worker_status]
> GeorepStatus: Worker Status Change status=Initializing...
> [2020-03-25 09:00:36.304274] I [monitor(monitor):159:monitor] Monitor:
> starting gsyncd worker   brick=/brickarchivos/archivos   slave_node=samil
> [2020-03-25 09:00:36.391111] I [gsyncd(agent /brickarchivos/archivos):
> 318:main] <top>: Using session config file       
path=/var/lib/glusterd/geo-
> replication/archivosvao_samil_archivossamil/gsyncd.conf
> [2020-03-25 09:00:36.392865] I [changelogagent(agent
/brickarchivos/archivos):
> 72:__init__] ChangelogAgent: Agent listining...
> [2020-03-25 09:00:36.399606] I [gsyncd(worker /brickarchivos/archivos):
> 318:main] <top>: Using session config file      
path=/var/lib/glusterd/geo-
> replication/archivosvao_samil_archivossamil/gsyncd.conf
> [2020-03-25 09:00:36.412956] I [resource(worker /brickarchivos/archivos):
> 1386:connect_remote] SSH: Initializing SSH connection between master and
> slave...
> [2020-03-25 09:00:37.772666] I [resource(worker /brickarchivos/archivos):
> 1435:connect_remote] SSH: SSH connection between master and slave
established.
> duration=1.3594
> [2020-03-25 09:00:37.773320] I [resource(worker /brickarchivos/archivos):
> 1105:connect] GLUSTER: Mounting gluster volume locally...
> [2020-03-25 09:00:38.821624] I [resource(worker /brickarchivos/archivos):
> 1128:connect] GLUSTER: Mounted gluster volume  duration=1.0479
> [2020-03-25 09:00:38.822003] I [subcmds(worker /brickarchivos/archivos):
> 84:subcmd_worker] <top>: Worker spawn successful. Acknowledging back
to
> monitor
> [2020-03-25 09:00:41.797329] I [master(worker /brickarchivos/archivos):
> 1640:register] _GMaster: Working dir     path=/var/lib/misc/gluster/gsyncd/
> archivosvao_samil_archivossamil/brickarchivos-archivos
> [2020-03-25 09:00:41.798168] I [resource(worker /brickarchivos/archivos):
> 1291:service_loop] GLUSTER: Register time      time=1585126841
> [2020-03-25 09:00:42.143373] I [gsyncdstatus(worker
/brickarchivos/archivos):
> 281:set_active] GeorepStatus: Worker Status Change status=Active
> [2020-03-25 09:00:42.310175] I [gsyncdstatus(worker
/brickarchivos/archivos):
> 253:set_worker_crawl_status] GeorepStatus: Crawl Status Change
> status=History Crawl
> [2020-03-25 09:00:42.311381] I [master(worker /brickarchivos/archivos):
> 1554:crawl] _GMaster: starting history crawl     turns=1 stime=(1585015849,
0)
> etime=1585126842        entry_stime=(1585043575, 0)
> [2020-03-25 09:00:43.347883] I [master(worker /brickarchivos/archivos):
> 1583:crawl] _GMaster: slave's time       stime=(1585015849, 0)
> [2020-03-25 09:00:43.932979] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=7    
return_code=0
> duration=0.1022
> [2020-03-25 09:00:43.980473] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=1   num_files=1    
return_code=0
> duration=0.0467
> [2020-03-25 09:00:44.387296] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=5    
return_code=0
> duration=0.0539
> [2020-03-25 09:00:44.424803] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=2   num_files=1    
return_code=0
> duration=0.0368
> [2020-03-25 09:00:44.877503] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=3   num_files=4    
return_code=0
> duration=0.0431
> [2020-03-25 09:00:44.918785] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken    job=3   num_files=3    
return_code=0
> duration=0.0403
> [2020-03-25 09:00:45.20351] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken     job=1   num_files=1    
return_code=0
> duration=0.0382
> [2020-03-25 09:00:45.55611] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken     job=1   num_files=1    
return_code=0
> duration=0.0344
> [2020-03-25 09:00:45.90699] I [master(worker /brickarchivos/archivos):
> 1991:syncjob] Syncer: Sync Time Taken     job=1   num_files=1    
return_code=0
> duration=0.0341
>
> .............
>
>
> It seems that the source of the error is the absence of this file:
>
> FileNotFoundError: [Errno 2] No such file or directory:
'/brickarchivos/
> archivos/.glusterfs/6e/eb/6eeb2c8f-da55-4066-995b-691290b69fdf'
>
>
> When the error appears, it tries to synchronize again and enters a retry
loop.
>
> I have stopped syncing and trying to resume it the next day. Now the error
> appears in a different file, but always within the index path of the
Gluster
>
>
> [2020-03-23 16:49:20.729115] I [master(worker /brickarchivos/archivos):
> 1227:process_change] _GMaster: Entry ops failed with gfid mismatch
> count=1
> [2020-03-23 16:49:20.731028] E [syncdutils(worker /brickarchivos/archivos):
> 339:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py",
line
> 332, in main    func(args)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/subcmds.py",
> line 86, in subcmd_worker    local.service_loop(remote)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py",
> line 1297, in service_loop    g3.crawlwrap(oneshot=True)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 602, in crawlwrap    self.crawl()
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1592, in crawl    self.changelogs_batch_process(changes)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1492, in changelogs_batch_process    self.process(batch)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1327, in process    self.process_change(change, done, retry)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 1230, in process_change    self.handle_entry_failures(failures, entries)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 973, in handle_entry_failures    failures1, retries, entry_ops1)
>   File
"/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py",
line
> 936, in fix_possible_entry_failures
>     pargfid))
> FileNotFoundError: [Errno 2] No such file or directory:
'/brickarchivos/
> archivos/.glusterfs/63/11/63113be6-0774-4719-96a6-619f7777aed2'
> [2020-03-23 16:49:20.764215] I [repce(agent /brickarchivos/archivos):
> 96:service_loop] RepceServer: terminating on reaching EOF.
>
> I have tried to remove the geo-replication and recreate it, but the problem
> recurs.
> I do not delete the slave data since they are more than 2.5 Tb and it would
> take several days to synchronize again:
>
> volume geo-replication archivosvao samil::archivossamil stop
> volume geo-replication archivosvao samil::archivossamil delete
> volume set archivosvao geo-replication.indexing off
> volume geo-replication archivosvao samil::archivossamil create push-pem
force
> volume geo-replication archivosvao samil::archivossamil start
>
> But without solution
>
> Any help would be appreciated.
> Thank you.
>
>
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Mar 2020 - Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX

[Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX

[Gluster-users] Geo-Replication File not Found on /.glusterfs/XX/XX/XXXXXXXXXXXX