thr3ads.net - Gluster users - [Gluster-users] Geo-replication stops after 4-5 hours [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Marcus Pedersén

2018-Aug-13 18:39 UTC

[Gluster-users] Geo-replication stops after 4-5 hours

Hi again,

New changes in behaviour, both master master nodes that are active toggles to
failure and the logs repeat the same over and over again.


Part of log, node1:

[2018-08-13 18:24:44.701711] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
[2018-08-13 18:24:44.704360] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
[2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1523907056, 0)  
entry_stime=None        etime=1534184684
[2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1523907056, 0)
[2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__]
RepceClient: call failed   call=5919:140339726538560:1534184685.88
method=entry_ops        error=GsyncdError
[2018-08-13 18:24:45.905111] E [syncdutils(worker
/urd-gds/gluster):298:log_raise_exception] <top>: execution of
"gluster" failed with ENOENT (No such file or directory)
[2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop]
RepceServer: terminating on reaching EOF.
[2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster
[2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty
[2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:24:56.655889] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-13 18:24:56.664628] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-08-13 18:24:58.347415] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.6824
[2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...
[2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect]
GLUSTER: Mounted gluster volume duration=1.1150
[2018-08-13 18:24:59.464184] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register]
_GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-13 18:25:01.549606] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534184701
[2018-08-13 18:25:01.593946] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active


Part of log, node2:

[2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty
[2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-13 18:25:24.610362] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-13 18:25:24.621551] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-08-13 18:25:26.164855] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.5431
[2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...
[2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect]
GLUSTER: Mounted gluster volume duration=1.1667
[2018-08-13 18:25:27.335560] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register]
_GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-13 18:25:37.769479] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534184737
[2018-08-13 18:25:37.787317] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
[2018-08-13 18:25:37.789822] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
[2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1525290650, 0)  
entry_stime=(1525296245, 0)     etime=1534184737
[2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1525290650, 0)
[2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process]
_GMaster: Skipping already processed entry ops     to_changelog=1525290651
num_changelogs=1        from_changelog=1525290651
[2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process]
_GMaster: Entry Time Taken MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0
duration=0.0000 UNL=0
[2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process]
_GMaster: Data/Metadata Time Taken SETA=1  SETX=0  meta_duration=0.0228   
data_duration=0.2456    DATA=0  XATT=0
[2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process]
_GMaster: Batch Completed  changelog_end=1525290651       
entry_stime=(1525296245, 0)     changelog_start=1525290651      stime=(152\
5290650, 0)   duration=0.2723 num_changelogs=1        mode=history_changelog
[2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time       stime=(1525290650, 0)
[2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.1327 num_files=3     job=3   return_code=23
[2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.1133 num_files=8     job=1   return_code=23
[2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.6315 num_files=47    job=2   return_code=23


...............


[2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.0988 num_files=18    job=2   return_code=23
[2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.2600 num_files=27    job=2   return_code=23
[2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.4090 num_files=67    job=2   return_code=23
[2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__]
RepceClient: call failed   call=18662:140079998809920:1534184774.58       
method=entry_ops        error=GsyncdError
[2018-08-13 18:26:14.866166] E [syncdutils(worker
/urd-gds/gluster):298:log_raise_exception] <top>: execution of
"gluster" failed with ENOENT (No such file or directory)
[2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop]
RepceServer: terminating on reaching EOF.
[2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster
[2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty


Help would be appriciated!

Thanks!


Regards

Marcus Peders?n


________________________________
Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
Skickat: den 12 augusti 2018 22:18
Till: khiremat at redhat.com
Kopia: gluster-users at gluster.org
?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours


Hi,

As the geo-replication stopped after 4-5 hours, I added a cron job that stopped,
paused for 2 mins and started geo-replication again every 6 hours.

The cron job has been running for 5 days and the changelogs has been catching
up.


Now a different behavior has shown up.

In one of the active master nodes I get a python error.

The other active master node has started to toggle status between active and
faulty.

See parts of logs below.


When I read Troubleshooting Geo-replication, there is a suggestion when sync is
not complete, to enforce a full sync of the data by erasing the index and
restarting GlusterFS geo-replication.

There is no explanation of how to erase the index.

Should I enforse a full sync?

How do I erase the index?


Thanks a lot!


Best regards

Marcus Peders?n



Node with python error:

[2018-08-12 16:02:05.304924] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-08-12 16:02:06.842832] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.5376
[2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...
[2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect]
GLUSTER: Mounted gluster volume duration=1.0869
[2018-08-12 16:02:07.931536] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register]
_GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 16:02:20.760411] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534089740
[2018-08-12 16:02:20.831918] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
[2018-08-12 16:02:20.835541] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
[2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1523906126, 0)  
entry_stime=None        etime=1534089740
[2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1523906126, 0)
[2018-08-12 16:02:21.950453] E [syncdutils(worker
/urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
360, in twrap
    tf(*aargs)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1880, in syncjob
    po = self.sync_engine(pb, self.log_err)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1413, in rsync
    rconf.ssh_ctl_args + \
AttributeError: 'NoneType' object has no attribute 'split'
[2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop]
RepceServer: terminating on reaching EOF.
[2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster
[2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file       
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.16882] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 16:02:33.26951] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-08-12 16:02:34.642838] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.6156
[2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...




Node that toggles status between active and faulty:

[2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.2757 num_files=27    job=2   return_code=23
[2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.3767 num_files=67    job=1   return_code=23
[2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__]
RepceClient: call failed   call=14853:139697829693248:1534102389.64       
method=entry_ops        error=GsyncdError
[2018-08-12 19:33:09.927042] E [syncdutils(worker
/urd-gds/gluster):298:log_raise_exception] <top>: execution of
"gluster" failed with ENOENT (No such file or directory)
[2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop]
RepceServer: terminating on reaching EOF.
[2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker
died in startup phase     brick=/urd-gds/gluster
[2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status]
GeorepStatus: Worker Status Change status=Faulty
[2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting
gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
[2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
[2018-08-12 19:33:20.457559] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
[2018-08-12 19:33:20.511825] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
[2018-08-12 19:33:22.88713] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established. duration=1.5766
[2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect]
GLUSTER: Mounting gluster volume locally...
[2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect]
GLUSTER: Mounted gluster volume duration=1.0896
[2018-08-12 19:33:23.179805] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
[2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register]
_GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
[2018-08-12 19:33:35.246495] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534102415
[2018-08-12 19:33:35.321988] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
[2018-08-12 19:33:35.324270] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
[2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1525290650, 0)  
entry_stime=(1525296245, 0)     etime=1534102415
[2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1525290650, 0)
[2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process]
_GMaster: Skipping already processed entry ops    to_changelog=1525290651
num_changelogs=1        from_changelog=1525290651
[2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process]
_GMaster: Entry Time Taken        MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0 
CRE=0   duration=0.0000 UNL=0
[2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process]
_GMaster: Data/Metadata Time Taken        SETA=1  SETX=0  meta_duration=0.0249  
data_duration=0.2156    DATA=0  XATT=0
[2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process]
_GMaster: Batch Completed changelog_end=1525290651       
entry_stime=(1525296245, 0)     changelog_start=1525290651      stime=(152\
5290650, 0)   duration=0.2455 num_changelogs=1        mode=history_changelog
[2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1525290650, 0)
[2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.0847 num_files=3     job=2   return_code=23
[2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.0807 num_files=4     job=2   return_code=23
[2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob]
Syncer: Sync Time Taken   duration=0.1098 num_files=11    job=1   return_code=23


________________________________
Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
Skickat: den 6 augusti 2018 13:28
Till: khiremat at redhat.com
Kopia: gluster-users at gluster.org
?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours


Hi,

Is there a way to resolve the problem with rsync and hanging processes?

Do I need to kill all the processes and hope that it starts again or stop/start
geo-replication?


If I stop/start geo-replication it will start again, I have tried it before.


Regards

Marcus



________________________________
Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
Skickat: den 2 augusti 2018 10:04
Till: Kotresh Hiremath Ravishankar
Kopia: gluster-users at gluster.org
?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours

Hi Kotresh,

I get the following and then it hangs:

strace: Process 5921 attached                    write(2, "rsync: link_stat
\"/tmp/gsyncd-au"..., 12811


When sync is running I can see rsync with geouser on the slave node.

Regards
Marcus

################
Marcus Peders?n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com>:
Cool, just check whether they are hung by any chance with following command.

#strace -f -p 5921

On Thu, Aug 2, 2018 at 12:25 PM, Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:
On both active master nodes there is an rsync process. As in:

root      5921  0.0  0.0 115424  1176 ?        S    Aug01   0:00 rsync -aR0
--inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs
--xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
-i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at
urd-gds-geo-001:/proc/13077/cwd

There is also ssh tunnels to slave nodes and  gsyncd.py processes.

Regards
Marcus

################
Marcus Peders?n
Systemadministrator
Interbull Centre
################
Sent from my phone
################

Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com<mailto:khiremat at redhat.com>>:
Could you look of any rsync processes hung in master or slave?

On Thu, Aug 2, 2018 at 11:18 AM, Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:
Hi Kortesh,
rsync  version 3.1.2  protocol version 31
All nodes run CentOS 7, updated the last couple of days.

Thanks
Marcus

################
Marcus Peders?n
Systemadministrator
Interbull Centre
################
Sent from my phone
################


Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com<mailto:khiremat at redhat.com>>:
Hi Marcus,

What's the rsync version being used?

Thanks,
Kotresh HR

On Thu, Aug 2, 2018 at 1:48 AM, Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:

Hi all!

I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.

With help from the list with some sym links and so on (handled in another
thread)

I got the geo-replication running.

It ran for 4-5 hours and then stopped, I stopped and started geo-replication and
it ran for another 4-5 hours.

4.1.2 was released and I updated, hoping this would solve the problem.

I still have the same problem, at start it runs for 4-5 hours and then it stops.

After that nothing happens, I have waited for days but still nothing happens.


I have looked through logs but can not find anything obvious.


Status for geo-replication is active for the two same nodes all the time:


MASTER NODE    MASTER VOL        MASTER BRICK         SLAVE USER    SLAVE       
SLAVE NODE         STATUS     CRAWL STATUS     LAST_SYNCED            ENTRY   
DATA     META    FAILURES    CHECKPOINT TIME        CHECKPOINT COMPLETED   
CHECKPOINT COMPLETION TIME
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
urd-gds-001    urd-gds-volume    /urd-gds/gluster     geouser       geouser at
urd-gds-geo-001::urd-gds-volume    urd-gds-geo-000    Active     History Crawl  
2018-04-16 20:32:09    0        14205    0       0           2018-07-27 21:12:44
No                      N/A
urd-gds-002    urd-gds-volume    /urd-gds/gluster     geouser       geouser at
urd-gds-geo-001::urd-gds-volume    urd-gds-geo-002    Passive    N/A            
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
urd-gds-004    urd-gds-volume    /urd-gds/gluster     geouser       geouser at
urd-gds-geo-001::urd-gds-volume    urd-gds-geo-002    Passive    N/A            
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
urd-gds-003    urd-gds-volume    /urd-gds/gluster     geouser       geouser at
urd-gds-geo-001::urd-gds-volume    urd-gds-geo-000    Active     History Crawl  
2018-05-01 20:58:14    285      4552     0       0           2018-07-27 21:12:44
No                      N/A
urd-gds-000    urd-gds-volume    /urd-gds/gluster1    geouser       geouser at
urd-gds-geo-001::urd-gds-volume    urd-gds-geo-001    Passive    N/A            
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
urd-gds-000    urd-gds-volume    /urd-gds/gluster2    geouser       geouser at
urd-gds-geo-001::urd-gds-volume    urd-gds-geo-001    Passive    N/A            
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A


Master cluster is Distribute-Replicate

2 x (2 + 1)

Used space 30TB


Slave cluster is Replicate

1 x (2 + 1)

Used space 9TB


Parts from gsyncd.logs are enclosed.


Thanks a lot!


Best regards

Marcus Peders?n




---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Thanks and Regards,
Kotresh H R


---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>



--
Thanks and Regards,
Kotresh H R

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>



--
Thanks and Regards,
Kotresh H R


---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180813/9bee9144/attachment.html>

Sunny Kumar

2018-Aug-13 19:49 UTC

head link

[Gluster-users] Geo-replication stops after 4-5 hours

Hi Marcus,

Can you please share mount log from slave (You can find it at
"/var/log/glusterfs/geo-replication-slaves/<mastervol>hostname<slavevol>/mnt____.log").

- Sunny
On Tue, Aug 14, 2018 at 12:48 AM Marcus Peders?n <marcus.pedersen at
slu.se> wrote:>
> Hi again,
>
> New changes in behaviour, both master master nodes that are active toggles
to failure and the logs repeat the same over and over again.
>
>
> Part of log, node1:
>
> [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
> [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
> [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1523907056, 0)  
entry_stime=None        etime=1534184684
> [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1523907056, 0)
> [2018-08-13 18:24:45.904958] E [repce(worker
/urd-gds/gluster):197:__call__] RepceClient: call failed  
call=5919:140339726538560:1534184685.88 method=entry_ops       
error=GsyncdError
> [2018-08-13 18:24:45.905111] E [syncdutils(worker
/urd-gds/gluster):298:log_raise_exception] <top>: execution of
"gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:24:45.919265] I [repce(agent
/urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor:
worker died in startup phase     brick=/urd-gds/gluster
> [2018-08-13 18:24:46.561784] I
[gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
> [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:24:56.655889] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:24:56.664628] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
> [2018-08-13 18:24:58.347415] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.6824
> [2018-08-13 18:24:58.348151] I [resource(worker
/urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:24:59.463598] I [resource(worker
/urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150
> [2018-08-13 18:24:59.464184] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
> [2018-08-13 18:25:01.549007] I [master(worker
/urd-gds/gluster):1534:register] _GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:01.549606] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534184701
> [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
>
>
> Part of log, node2:
>
> [2018-08-13 18:25:14.554233] I
[gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
> [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-13 18:25:24.610362] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-13 18:25:24.621551] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
> [2018-08-13 18:25:26.164855] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.5431
> [2018-08-13 18:25:26.165124] I [resource(worker
/urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-13 18:25:27.331969] I [resource(worker
/urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667
> [2018-08-13 18:25:27.335560] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
> [2018-08-13 18:25:37.768867] I [master(worker
/urd-gds/gluster):1534:register] _GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-13 18:25:37.769479] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534184737
> [2018-08-13 18:25:37.787317] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
> [2018-08-13 18:25:37.789822] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
> [2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1525290650, 0)  
entry_stime=(1525296245, 0)     etime=1534184737
> [2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1525290650, 0)
> [2018-08-13 18:25:38.63499] I [master(worker
/urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops  
to_changelog=1525290651 num_changelogs=1        from_changelog=1525290651
> [2018-08-13 18:25:38.63621] I [master(worker
/urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0   MKN=0   LIN=0
SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=0
> [2018-08-13 18:25:38.63678] I [master(worker
/urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 
SETX=0  meta_duration=0.0228    data_duration=0.2456    DATA=0  XATT=0
> [2018-08-13 18:25:38.63822] I [master(worker
/urd-gds/gluster):1335:process] _GMaster: Batch Completed 
changelog_end=1525290651        entry_stime=(1525296245, 0)    
changelog_start=1525290651      stime=(152\
> 5290650, 0)   duration=0.2723 num_changelogs=1       
mode=history_changelog
> [2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time       stime=(1525290650, 0)
> [2018-08-13 18:25:38.480941] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.1327
num_files=3     job=3   return_code=23
> [2018-08-13 18:25:39.963423] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.1133
num_files=8     job=1   return_code=23
> [2018-08-13 18:25:39.980724] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.6315
num_files=47    job=2   return_code=23
>
>
> ...............
>
>
> [2018-08-13 18:26:04.534953] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.0988
num_files=18    job=2   return_code=23
> [2018-08-13 18:26:07.798583] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.2600
num_files=27    job=2   return_code=23
> [2018-08-13 18:26:08.708100] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.4090
num_files=67    job=2   return_code=23
> [2018-08-13 18:26:14.865883] E [repce(worker
/urd-gds/gluster):197:__call__] RepceClient: call failed  
call=18662:140079998809920:1534184774.58        method=entry_ops       
error=GsyncdError
> [2018-08-13 18:26:14.866166] E [syncdutils(worker
/urd-gds/gluster):298:log_raise_exception] <top>: execution of
"gluster" failed with ENOENT (No such file or directory)
> [2018-08-13 18:26:14.991022] I [repce(agent
/urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor:
worker died in startup phase     brick=/urd-gds/gluster
> [2018-08-13 18:26:15.397360] I
[gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
>
>
> Help would be appriciated!
>
> Thanks!
>
>
> Regards
>
> Marcus Peders?n
>
>
> ________________________________
> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
> Skickat: den 12 augusti 2018 22:18
> Till: khiremat at redhat.com
> Kopia: gluster-users at gluster.org
> ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> As the geo-replication stopped after 4-5 hours, I added a cron job that
stopped, paused for 2 mins and started geo-replication again every 6 hours.
>
> The cron job has been running for 5 days and the changelogs has been
catching up.
>
>
> Now a different behavior has shown up.
>
> In one of the active master nodes I get a python error.
>
> The other active master node has started to toggle status between active
and faulty.
>
> See parts of logs below.
>
>
> When I read Troubleshooting Geo-replication, there is a suggestion when
sync is not complete, to enforce a full sync of the data by erasing the index
and restarting GlusterFS geo-replication.
>
> There is no explanation of how to erase the index.
>
> Should I enforse a full sync?
>
> How do I erase the index?
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Peders?n
>
>
>
> Node with python error:
>
> [2018-08-12 16:02:05.304924] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
> [2018-08-12 16:02:06.842832] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.5376
> [2018-08-12 16:02:06.843370] I [resource(worker
/urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 16:02:07.930706] I [resource(worker
/urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869
> [2018-08-12 16:02:07.931536] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
> [2018-08-12 16:02:20.759797] I [master(worker
/urd-gds/gluster):1534:register] _GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 16:02:20.760411] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534089740
> [2018-08-12 16:02:20.831918] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
> [2018-08-12 16:02:20.835541] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
> [2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1523906126, 0)  
entry_stime=None        etime=1534089740
> [2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1523906126, 0)
> [2018-08-12 16:02:21.950453] E [syncdutils(worker
/urd-gds/gluster):330:log_raise_exception] <top>: FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 360, in twrap
>     tf(*aargs)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1880, in syncjob
>     po = self.sync_engine(pb, self.log_err)
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py",
line 1413, in rsync
>     rconf.ssh_ctl_args + \
> AttributeError: 'NoneType' object has no attribute 'split'
> [2018-08-12 16:02:21.975228] I [repce(agent
/urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor:
worker died in startup phase     brick=/urd-gds/gluster
> [2018-08-12 16:02:22.954096] I
[gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
> [2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file       
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.16882] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 16:02:33.26951] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
> [2018-08-12 16:02:34.642838] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established.        duration=1.6156
> [2018-08-12 16:02:34.643369] I [resource(worker
/urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
>
>
>
>
> Node that toggles status between active and faulty:
>
> [2018-08-12 19:33:03.475833] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.2757
num_files=27    job=2   return_code=23
> [2018-08-12 19:33:04.818854] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.3767
num_files=67    job=1   return_code=23
> [2018-08-12 19:33:09.926820] E [repce(worker
/urd-gds/gluster):197:__call__] RepceClient: call failed  
call=14853:139697829693248:1534102389.64        method=entry_ops       
error=GsyncdError
> [2018-08-12 19:33:09.927042] E [syncdutils(worker
/urd-gds/gluster):298:log_raise_exception] <top>: execution of
"gluster" failed with ENOENT (No such file or directory)
> [2018-08-12 19:33:09.942267] I [repce(agent
/urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF.
> [2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor:
worker died in startup phase     brick=/urd-gds/gluster
> [2018-08-12 19:33:10.363173] I
[gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change
status=Faulty
> [2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor:
starting gsyncd worker   brick=/urd-gds/gluster  slave_node=urd-gds-geo-000
> [2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main]
<top>: Using session config file      
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main]
<top>: Using session config file     
path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf
> [2018-08-12 19:33:20.457559] I [changelogagent(agent
/urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining...
> [2018-08-12 19:33:20.511825] I [resource(worker
/urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between
master and slave...
> [2018-08-12 19:33:22.88713] I [resource(worker
/urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and
slave established. duration=1.5766
> [2018-08-12 19:33:22.89272] I [resource(worker
/urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally...
> [2018-08-12 19:33:23.179249] I [resource(worker
/urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896
> [2018-08-12 19:33:23.179805] I [subcmds(worker
/urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful.
Acknowledging back to monitor
> [2018-08-12 19:33:35.245277] I [master(worker
/urd-gds/gluster):1534:register] _GMaster: Working dir   
path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster
> [2018-08-12 19:33:35.246495] I [resource(worker
/urd-gds/gluster):1253:service_loop] GLUSTER: Register time     time=1534102415
> [2018-08-12 19:33:35.321988] I [gsyncdstatus(worker
/urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change       
status=Active
> [2018-08-12 19:33:35.324270] I [gsyncdstatus(worker
/urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change
status=History Crawl
> [2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl]
_GMaster: starting history crawl    turns=1 stime=(1525290650, 0)  
entry_stime=(1525296245, 0)     etime=1534102415
> [2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1525290650, 0)
> [2018-08-12 19:33:35.574338] I [master(worker
/urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops  
to_changelog=1525290651 num_changelogs=1        from_changelog=1525290651
> [2018-08-12 19:33:35.574448] I [master(worker
/urd-gds/gluster):1315:process] _GMaster: Entry Time Taken        MKD=0   MKN=0 
LIN=0   SYM=0   REN=0   RMD=0   CRE=0   duration=0.0000 UNL=0
> [2018-08-12 19:33:35.574507] I [master(worker
/urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken        SETA=1
SETX=0  meta_duration=0.0249    data_duration=0.2156    DATA=0  XATT=0
> [2018-08-12 19:33:35.574723] I [master(worker
/urd-gds/gluster):1335:process] _GMaster: Batch Completed
changelog_end=1525290651        entry_stime=(1525296245, 0)    
changelog_start=1525290651      stime=(152\
> 5290650, 0)   duration=0.2455 num_changelogs=1       
mode=history_changelog
> [2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl]
_GMaster: slave's time      stime=(1525290650, 0)
> [2018-08-12 19:33:35.780823] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.0847
num_files=3     job=2   return_code=23
> [2018-08-12 19:33:37.362822] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.0807
num_files=4     job=2   return_code=23
> [2018-08-12 19:33:37.818542] I [master(worker
/urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken   duration=0.1098
num_files=11    job=1   return_code=23
>
>
> ________________________________
> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
> Skickat: den 6 augusti 2018 13:28
> Till: khiremat at redhat.com
> Kopia: gluster-users at gluster.org
> ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
>
> Hi,
>
> Is there a way to resolve the problem with rsync and hanging processes?
>
> Do I need to kill all the processes and hope that it starts again or
stop/start geo-replication?
>
>
> If I stop/start geo-replication it will start again, I have tried it
before.
>
>
> Regards
>
> Marcus
>
>
>
> ________________________________
> Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se>
> Skickat: den 2 augusti 2018 10:04
> Till: Kotresh Hiremath Ravishankar
> Kopia: gluster-users at gluster.org
> ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours
>
> Hi Kotresh,
>
> I get the following and then it hangs:
>
> strace: Process 5921 attached                    write(2, "rsync:
link_stat \"/tmp/gsyncd-au"..., 12811
>
>
> When sync is running I can see rsync with geouser on the slave node.
>
> Regards
> Marcus
>
> ################
> Marcus Peders?n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com>:
>
> Cool, just check whether they are hung by any chance with following
command.
>
> #strace -f -p 5921
>
> On Thu, Aug 2, 2018 at 12:25 PM, Marcus Peders?n <marcus.pedersen at
slu.se> wrote:
>
> On both active master nodes there is an rsync process. As in:
>
> root      5921  0.0  0.0 115424  1176 ?        S    Aug01   0:00 rsync -aR0
--inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs
--xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
-i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at
urd-gds-geo-001:/proc/13077/cwd
>
> There is also ssh tunnels to slave nodes and  gsyncd.py processes.
>
> Regards
> Marcus
>
> ################
> Marcus Peders?n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
> Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com>:
> Could you look of any rsync processes hung in master or slave?
>
> On Thu, Aug 2, 2018 at 11:18 AM, Marcus Peders?n <marcus.pedersen at
slu.se> wrote:
>
> Hi Kortesh,
> rsync  version 3.1.2  protocol version 31
> All nodes run CentOS 7, updated the last couple of days.
>
> Thanks
> Marcus
>
> ################
> Marcus Peders?n
> Systemadministrator
> Interbull Centre
> ################
> Sent from my phone
> ################
>
>
> Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <khiremat at
redhat.com>:
>
> Hi Marcus,
>
> What's the rsync version being used?
>
> Thanks,
> Kotresh HR
>
> On Thu, Aug 2, 2018 at 1:48 AM, Marcus Peders?n <marcus.pedersen at
slu.se> wrote:
>
> Hi all!
>
> I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication.
>
> With help from the list with some sym links and so on (handled in another
thread)
>
> I got the geo-replication running.
>
> It ran for 4-5 hours and then stopped, I stopped and started
geo-replication and it ran for another 4-5 hours.
>
> 4.1.2 was released and I updated, hoping this would solve the problem.
>
> I still have the same problem, at start it runs for 4-5 hours and then it
stops.
>
> After that nothing happens, I have waited for days but still nothing
happens.
>
>
> I have looked through logs but can not find anything obvious.
>
>
> Status for geo-replication is active for the two same nodes all the time:
>
>
> MASTER NODE    MASTER VOL        MASTER BRICK         SLAVE USER    SLAVE  
SLAVE NODE         STATUS     CRAWL STATUS     LAST_SYNCED            ENTRY   
DATA     META    FAILURES    CHECKPOINT TIME        CHECKPOINT COMPLETED   
CHECKPOINT COMPLETION TIME
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> urd-gds-001    urd-gds-volume    /urd-gds/gluster     geouser       geouser
at urd-gds-geo-001::urd-gds-volume    urd-gds-geo-000    Active     History
Crawl    2018-04-16 20:32:09    0        14205    0       0           2018-07-27
21:12:44    No                      N/A
> urd-gds-002    urd-gds-volume    /urd-gds/gluster     geouser       geouser
at urd-gds-geo-001::urd-gds-volume    urd-gds-geo-002    Passive    N/A         
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
> urd-gds-004    urd-gds-volume    /urd-gds/gluster     geouser       geouser
at urd-gds-geo-001::urd-gds-volume    urd-gds-geo-002    Passive    N/A         
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
> urd-gds-003    urd-gds-volume    /urd-gds/gluster     geouser       geouser
at urd-gds-geo-001::urd-gds-volume    urd-gds-geo-000    Active     History
Crawl    2018-05-01 20:58:14    285      4552     0       0           2018-07-27
21:12:44    No                      N/A
> urd-gds-000    urd-gds-volume    /urd-gds/gluster1    geouser       geouser
at urd-gds-geo-001::urd-gds-volume    urd-gds-geo-001    Passive    N/A         
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
> urd-gds-000    urd-gds-volume    /urd-gds/gluster2    geouser       geouser
at urd-gds-geo-001::urd-gds-volume    urd-gds-geo-001    Passive    N/A         
N/A                    N/A      N/A      N/A     N/A         N/A                
N/A                     N/A
>
>
> Master cluster is Distribute-Replicate
>
> 2 x (2 + 1)
>
> Used space 30TB
>
>
> Slave cluster is Replicate
>
> 1 x (2 + 1)
>
> Used space 9TB
>
>
> Parts from gsyncd.logs are enclosed.
>
>
> Thanks a lot!
>
>
> Best regards
>
> Marcus Peders?n
>
>
>
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Gluster users - Aug 2018 - Geo-replication stops after 4-5 hours

[Gluster-users] Geo-replication stops after 4-5 hours

[Gluster-users] Geo-replication stops after 4-5 hours