Hi Marcus, Can you please share mount log from slave (You can find it at "/var/log/glusterfs/geo-replication-slaves/<mastervol>hostname<slavevol>/mnt____.log"). - Sunny On Tue, Aug 14, 2018 at 12:48 AM Marcus Peders?n <marcus.pedersen at slu.se> wrote:> > Hi again, > > New changes in behaviour, both master master nodes that are active toggles to failure and the logs repeat the same over and over again. > > > Part of log, node1: > > [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523907056, 0) entry_stime=None etime=1534184684 > [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523907056, 0) > [2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=5919:140339726538560:1534184685.88 method=entry_ops error=GsyncdError > [2018-08-13 18:24:45.905111] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory) > [2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:24:56.655889] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-13 18:24:56.664628] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-13 18:24:58.347415] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6824 > [2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150 > [2018-08-13 18:24:59.464184] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-13 18:25:01.549606] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184701 > [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > > > Part of log, node2: > > [2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:25:24.610362] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-13 18:25:24.621551] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-13 18:25:26.164855] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5431 > [2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667 > [2018-08-13 18:25:27.335560] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-13 18:25:37.769479] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737 > [2018-08-13 18:25:37.787317] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-13 18:25:37.789822] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534184737 > [2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651 > [2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 > [2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0228 data_duration=0.2456 DATA=0 XATT=0 > [2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\ > 5290650, 0) duration=0.2723 num_changelogs=1 mode=history_changelog > [2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1327 num_files=3 job=3 return_code=23 > [2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1133 num_files=8 job=1 return_code=23 > [2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.6315 num_files=47 job=2 return_code=23 > > > ............... > > > [2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0988 num_files=18 job=2 return_code=23 > [2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2600 num_files=27 job=2 return_code=23 > [2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.4090 num_files=67 job=2 return_code=23 > [2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=18662:140079998809920:1534184774.58 method=entry_ops error=GsyncdError > [2018-08-13 18:26:14.866166] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory) > [2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > > > Help would be appriciated! > > Thanks! > > > Regards > > Marcus Peders?n > > > ________________________________ > Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > Skickat: den 12 augusti 2018 22:18 > Till: khiremat at redhat.com > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours > > > Hi, > > As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours. > > The cron job has been running for 5 days and the changelogs has been catching up. > > > Now a different behavior has shown up. > > In one of the active master nodes I get a python error. > > The other active master node has started to toggle status between active and faulty. > > See parts of logs below. > > > When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication. > > There is no explanation of how to erase the index. > > Should I enforse a full sync? > > How do I erase the index? > > > Thanks a lot! > > > Best regards > > Marcus Peders?n > > > > Node with python error: > > [2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376 > [2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869 > [2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740 > [2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740 > [2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0) > [2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap > tf(*aargs) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob > po = self.sync_engine(pb, self.log_err) > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync > rconf.ssh_ctl_args + \ > AttributeError: 'NoneType' object has no attribute 'split' > [2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156 > [2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > > > > > Node that toggles status between active and faulty: > > [2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23 > [2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23 > [2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError > [2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory) > [2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766 > [2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896 > [2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415 > [2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415 > [2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651 > [2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 > [2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0 > [2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\ > 5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog > [2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23 > [2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23 > [2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23 > > > ________________________________ > Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > Skickat: den 6 augusti 2018 13:28 > Till: khiremat at redhat.com > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours > > > Hi, > > Is there a way to resolve the problem with rsync and hanging processes? > > Do I need to kill all the processes and hope that it starts again or stop/start geo-replication? > > > If I stop/start geo-replication it will start again, I have tried it before. > > > Regards > > Marcus > > > > ________________________________ > Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > Skickat: den 2 augusti 2018 10:04 > Till: Kotresh Hiremath Ravishankar > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours > > Hi Kotresh, > > I get the following and then it hangs: > > strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811 > > > When sync is running I can see rsync with geouser on the slave node. > > Regards > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > > Cool, just check whether they are hung by any chance with following command. > > #strace -f -p 5921 > > On Thu, Aug 2, 2018 at 12:25 PM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > On both active master nodes there is an rsync process. As in: > > root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001:/proc/13077/cwd > > There is also ssh tunnels to slave nodes and gsyncd.py processes. > > Regards > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > Could you look of any rsync processes hung in master or slave? > > On Thu, Aug 2, 2018 at 11:18 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > Hi Kortesh, > rsync version 3.1.2 protocol version 31 > All nodes run CentOS 7, updated the last couple of days. > > Thanks > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > > Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > > Hi Marcus, > > What's the rsync version being used? > > Thanks, > Kotresh HR > > On Thu, Aug 2, 2018 at 1:48 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > Hi all! > > I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication. > > With help from the list with some sym links and so on (handled in another thread) > > I got the geo-replication running. > > It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours. > > 4.1.2 was released and I updated, hoping this would solve the problem. > > I still have the same problem, at start it runs for 4-5 hours and then it stops. > > After that nothing happens, I have waited for days but still nothing happens. > > > I have looked through logs but can not find anything obvious. > > > Status for geo-replication is active for the two same nodes all the time: > > > MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > urd-gds-001 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A > urd-gds-002 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > urd-gds-004 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > urd-gds-003 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A > urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > > > Master cluster is Distribute-Replicate > > 2 x (2 + 1) > > Used space 30TB > > > Slave cluster is Replicate > > 1 x (2 + 1) > > Used space 9TB > > > Parts from gsyncd.logs are enclosed. > > > Thanks a lot! > > > Best regards > > Marcus Peders?n > > > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Thanks and Regards, > Kotresh H R > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > > > -- > Thanks and Regards, > Kotresh H R > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > > > -- > Thanks and Regards, > Kotresh H R > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users
Marcus Pedersén
2018-Aug-13 20:45 UTC
[Gluster-users] Geo-replication stops after 4-5 hours
Hi Sunny, Please find the enclosed mount logs for the two active mater nodes. I cut them down to todays logs. Thanks! Marcus ________________________________________ Fr?n: Sunny Kumar <sunkumar at redhat.com> Skickat: den 13 augusti 2018 21:49 Till: Marcus Peders?n Kopia: Kotresh Hiremath Ravishankar; gluster-users at gluster.org ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours Hi Marcus, Can you please share mount log from slave (You can find it at "/var/log/glusterfs/geo-replication-slaves/<mastervol>hostname<slavevol>/mnt____.log"). - Sunny On Tue, Aug 14, 2018 at 12:48 AM Marcus Peders?n <marcus.pedersen at slu.se> wrote:> > Hi again, > > New changes in behaviour, both master master nodes that are active toggles to failure and the logs repeat the same over and over again. > > > Part of log, node1: > > [2018-08-13 18:24:44.701711] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-13 18:24:44.704360] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-13 18:24:44.705162] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523907056, 0) entry_stime=None etime=1534184684 > [2018-08-13 18:24:45.717072] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523907056, 0) > [2018-08-13 18:24:45.904958] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=5919:140339726538560:1534184685.88 method=entry_ops error=GsyncdError > [2018-08-13 18:24:45.905111] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory) > [2018-08-13 18:24:45.919265] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-13 18:24:46.553194] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-13 18:24:46.561784] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-13 18:24:56.581748] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-13 18:24:56.655164] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:24:56.655193] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:24:56.655889] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-13 18:24:56.664628] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-13 18:24:58.347415] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6824 > [2018-08-13 18:24:58.348151] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-13 18:24:59.463598] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1150 > [2018-08-13 18:24:59.464184] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-13 18:25:01.549007] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-13 18:25:01.549606] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184701 > [2018-08-13 18:25:01.593946] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > > > Part of log, node2: > > [2018-08-13 18:25:14.554233] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-13 18:25:24.568727] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-13 18:25:24.609642] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:25:24.609678] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-13 18:25:24.610362] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-13 18:25:24.621551] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-13 18:25:26.164855] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5431 > [2018-08-13 18:25:26.165124] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-13 18:25:27.331969] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.1667 > [2018-08-13 18:25:27.335560] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-13 18:25:37.768867] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-13 18:25:37.769479] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534184737 > [2018-08-13 18:25:37.787317] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-13 18:25:37.789822] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-13 18:25:37.790008] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534184737 > [2018-08-13 18:25:37.791222] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-13 18:25:38.63499] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651 > [2018-08-13 18:25:38.63621] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 > [2018-08-13 18:25:38.63678] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0228 data_duration=0.2456 DATA=0 XATT=0 > [2018-08-13 18:25:38.63822] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\ > 5290650, 0) duration=0.2723 num_changelogs=1 mode=history_changelog > [2018-08-13 18:25:38.73400] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-13 18:25:38.480941] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1327 num_files=3 job=3 return_code=23 > [2018-08-13 18:25:39.963423] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1133 num_files=8 job=1 return_code=23 > [2018-08-13 18:25:39.980724] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.6315 num_files=47 job=2 return_code=23 > > > ............... > > > [2018-08-13 18:26:04.534953] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0988 num_files=18 job=2 return_code=23 > [2018-08-13 18:26:07.798583] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2600 num_files=27 job=2 return_code=23 > [2018-08-13 18:26:08.708100] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.4090 num_files=67 job=2 return_code=23 > [2018-08-13 18:26:14.865883] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=18662:140079998809920:1534184774.58 method=entry_ops error=GsyncdError > [2018-08-13 18:26:14.866166] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory) > [2018-08-13 18:26:14.991022] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-13 18:26:15.384844] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-13 18:26:15.397360] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > > > Help would be appriciated! > > Thanks! > > > Regards > > Marcus Peders?n > > > ________________________________ > Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > Skickat: den 12 augusti 2018 22:18 > Till: khiremat at redhat.com > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours > > > Hi, > > As the geo-replication stopped after 4-5 hours, I added a cron job that stopped, paused for 2 mins and started geo-replication again every 6 hours. > > The cron job has been running for 5 days and the changelogs has been catching up. > > > Now a different behavior has shown up. > > In one of the active master nodes I get a python error. > > The other active master node has started to toggle status between active and faulty. > > See parts of logs below. > > > When I read Troubleshooting Geo-replication, there is a suggestion when sync is not complete, to enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication. > > There is no explanation of how to erase the index. > > Should I enforse a full sync? > > How do I erase the index? > > > Thanks a lot! > > > Best regards > > Marcus Peders?n > > > > Node with python error: > > [2018-08-12 16:02:05.304924] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-12 16:02:06.842832] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5376 > [2018-08-12 16:02:06.843370] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-12 16:02:07.930706] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0869 > [2018-08-12 16:02:07.931536] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-12 16:02:20.759797] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-12 16:02:20.760411] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534089740 > [2018-08-12 16:02:20.831918] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-12 16:02:20.835541] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-12 16:02:20.836832] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1523906126, 0) entry_stime=None etime=1534089740 > [2018-08-12 16:02:21.848570] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1523906126, 0) > [2018-08-12 16:02:21.950453] E [syncdutils(worker /urd-gds/gluster):330:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 360, in twrap > tf(*aargs) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1880, in syncjob > po = self.sync_engine(pb, self.log_err) > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1413, in rsync > rconf.ssh_ctl_args + \ > AttributeError: 'NoneType' object has no attribute 'split' > [2018-08-12 16:02:21.975228] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-12 16:02:22.947170] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-12 16:02:22.954096] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-12 16:02:32.973948] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-12 16:02:33.16155] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 16:02:33.16882] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-12 16:02:33.17292] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 16:02:33.26951] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-12 16:02:34.642838] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.6156 > [2018-08-12 16:02:34.643369] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > > > > > Node that toggles status between active and faulty: > > [2018-08-12 19:33:03.475833] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.2757 num_files=27 job=2 return_code=23 > [2018-08-12 19:33:04.818854] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.3767 num_files=67 job=1 return_code=23 > [2018-08-12 19:33:09.926820] E [repce(worker /urd-gds/gluster):197:__call__] RepceClient: call failed call=14853:139697829693248:1534102389.64 method=entry_ops error=GsyncdError > [2018-08-12 19:33:09.927042] E [syncdutils(worker /urd-gds/gluster):298:log_raise_exception] <top>: execution of "gluster" failed with ENOENT (No such file or directory) > [2018-08-12 19:33:09.942267] I [repce(agent /urd-gds/gluster):80:service_loop] RepceServer: terminating on reaching EOF. > [2018-08-12 19:33:10.349848] I [monitor(monitor):272:monitor] Monitor: worker died in startup phase brick=/urd-gds/gluster > [2018-08-12 19:33:10.363173] I [gsyncdstatus(monitor):243:set_worker_status] GeorepStatus: Worker Status Change status=Faulty > [2018-08-12 19:33:20.386089] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/urd-gds/gluster slave_node=urd-gds-geo-000 > [2018-08-12 19:33:20.456687] I [gsyncd(agent /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 19:33:20.456686] I [gsyncd(worker /urd-gds/gluster):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/gsyncd.conf > [2018-08-12 19:33:20.457559] I [changelogagent(agent /urd-gds/gluster):72:__init__] ChangelogAgent: Agent listining... > [2018-08-12 19:33:20.511825] I [resource(worker /urd-gds/gluster):1348:connect_remote] SSH: Initializing SSH connection between master and slave... > [2018-08-12 19:33:22.88713] I [resource(worker /urd-gds/gluster):1395:connect_remote] SSH: SSH connection between master and slave established. duration=1.5766 > [2018-08-12 19:33:22.89272] I [resource(worker /urd-gds/gluster):1067:connect] GLUSTER: Mounting gluster volume locally... > [2018-08-12 19:33:23.179249] I [resource(worker /urd-gds/gluster):1090:connect] GLUSTER: Mounted gluster volume duration=1.0896 > [2018-08-12 19:33:23.179805] I [subcmds(worker /urd-gds/gluster):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor > [2018-08-12 19:33:35.245277] I [master(worker /urd-gds/gluster):1534:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/urd-gds-volume_urd-gds-geo-001_urd-gds-volume/urd-gds-gluster > [2018-08-12 19:33:35.246495] I [resource(worker /urd-gds/gluster):1253:service_loop] GLUSTER: Register time time=1534102415 > [2018-08-12 19:33:35.321988] I [gsyncdstatus(worker /urd-gds/gluster):276:set_active] GeorepStatus: Worker Status Change status=Active > [2018-08-12 19:33:35.324270] I [gsyncdstatus(worker /urd-gds/gluster):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl > [2018-08-12 19:33:35.324902] I [master(worker /urd-gds/gluster):1448:crawl] _GMaster: starting history crawl turns=1 stime=(1525290650, 0) entry_stime=(1525296245, 0) etime=1534102415 > [2018-08-12 19:33:35.328735] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-12 19:33:35.574338] I [master(worker /urd-gds/gluster):1301:process] _GMaster: Skipping already processed entry ops to_changelog=1525290651 num_changelogs=1 from_changelog=1525290651 > [2018-08-12 19:33:35.574448] I [master(worker /urd-gds/gluster):1315:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 > [2018-08-12 19:33:35.574507] I [master(worker /urd-gds/gluster):1325:process] _GMaster: Data/Metadata Time Taken SETA=1 SETX=0 meta_duration=0.0249 data_duration=0.2156 DATA=0 XATT=0 > [2018-08-12 19:33:35.574723] I [master(worker /urd-gds/gluster):1335:process] _GMaster: Batch Completed changelog_end=1525290651 entry_stime=(1525296245, 0) changelog_start=1525290651 stime=(152\ > 5290650, 0) duration=0.2455 num_changelogs=1 mode=history_changelog > [2018-08-12 19:33:35.582545] I [master(worker /urd-gds/gluster):1477:crawl] _GMaster: slave's time stime=(1525290650, 0) > [2018-08-12 19:33:35.780823] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0847 num_files=3 job=2 return_code=23 > [2018-08-12 19:33:37.362822] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.0807 num_files=4 job=2 return_code=23 > [2018-08-12 19:33:37.818542] I [master(worker /urd-gds/gluster):1885:syncjob] Syncer: Sync Time Taken duration=0.1098 num_files=11 job=1 return_code=23 > > > ________________________________ > Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > Skickat: den 6 augusti 2018 13:28 > Till: khiremat at redhat.com > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours > > > Hi, > > Is there a way to resolve the problem with rsync and hanging processes? > > Do I need to kill all the processes and hope that it starts again or stop/start geo-replication? > > > If I stop/start geo-replication it will start again, I have tried it before. > > > Regards > > Marcus > > > > ________________________________ > Fr?n: gluster-users-bounces at gluster.org <gluster-users-bounces at gluster.org> f?r Marcus Peders?n <marcus.pedersen at slu.se> > Skickat: den 2 augusti 2018 10:04 > Till: Kotresh Hiremath Ravishankar > Kopia: gluster-users at gluster.org > ?mne: Re: [Gluster-users] Geo-replication stops after 4-5 hours > > Hi Kotresh, > > I get the following and then it hangs: > > strace: Process 5921 attached write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 12811 > > > When sync is running I can see rsync with geouser on the slave node. > > Regards > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > Den 2 aug. 2018 09:31 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > > Cool, just check whether they are hung by any chance with following command. > > #strace -f -p 5921 > > On Thu, Aug 2, 2018 at 12:25 PM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > On both active master nodes there is an rsync process. As in: > > root 5921 0.0 0.0 115424 1176 ? S Aug01 0:00 rsync -aR0 --inplace --files-from=- --super --stats --numeric-ids --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-stuphs/bf60c68f1a195dad59573a8dbaa309f2.sock geouser at urd-gds-geo-001:/proc/13077/cwd > > There is also ssh tunnels to slave nodes and gsyncd.py processes. > > Regards > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > Den 2 aug. 2018 08:07 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > Could you look of any rsync processes hung in master or slave? > > On Thu, Aug 2, 2018 at 11:18 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > Hi Kortesh, > rsync version 3.1.2 protocol version 31 > All nodes run CentOS 7, updated the last couple of days. > > Thanks > Marcus > > ################ > Marcus Peders?n > Systemadministrator > Interbull Centre > ################ > Sent from my phone > ################ > > > Den 2 aug. 2018 06:13 skrev Kotresh Hiremath Ravishankar <khiremat at redhat.com>: > > Hi Marcus, > > What's the rsync version being used? > > Thanks, > Kotresh HR > > On Thu, Aug 2, 2018 at 1:48 AM, Marcus Peders?n <marcus.pedersen at slu.se> wrote: > > Hi all! > > I upgraded from 3.12.9 to 4.1.1 and had problems with geo-replication. > > With help from the list with some sym links and so on (handled in another thread) > > I got the geo-replication running. > > It ran for 4-5 hours and then stopped, I stopped and started geo-replication and it ran for another 4-5 hours. > > 4.1.2 was released and I updated, hoping this would solve the problem. > > I still have the same problem, at start it runs for 4-5 hours and then it stops. > > After that nothing happens, I have waited for days but still nothing happens. > > > I have looked through logs but can not find anything obvious. > > > Status for geo-replication is active for the two same nodes all the time: > > > MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > urd-gds-001 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-04-16 20:32:09 0 14205 0 0 2018-07-27 21:12:44 No N/A > urd-gds-002 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > urd-gds-004 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-002 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > urd-gds-003 urd-gds-volume /urd-gds/gluster geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-000 Active History Crawl 2018-05-01 20:58:14 285 4552 0 0 2018-07-27 21:12:44 No N/A > urd-gds-000 urd-gds-volume /urd-gds/gluster1 geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > urd-gds-000 urd-gds-volume /urd-gds/gluster2 geouser geouser at urd-gds-geo-001::urd-gds-volume urd-gds-geo-001 Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A > > > Master cluster is Distribute-Replicate > > 2 x (2 + 1) > > Used space 30TB > > > Slave cluster is Replicate > > 1 x (2 + 1) > > Used space 9TB > > > Parts from gsyncd.logs are enclosed. > > > Thanks a lot! > > > Best regards > > Marcus Peders?n > > > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Thanks and Regards, > Kotresh H R > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > > > -- > Thanks and Regards, > Kotresh H R > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > > > > -- > Thanks and Regards, > Kotresh H R > > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users--- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -------------- next part -------------- A non-text attachment was scrubbed... Name: today_mnt-urd-gds-001-urd-gds-gluster.log.gz Type: application/gzip Size: 1154878 bytes Desc: today_mnt-urd-gds-001-urd-gds-gluster.log.gz URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180813/c38ab5de/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: today_mnt-urd-gds-003-urd-gds-gluster.log.gz Type: application/gzip Size: 533166 bytes Desc: today_mnt-urd-gds-003-urd-gds-gluster.log.gz URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180813/c38ab5de/attachment-0003.bin>