mabi
2017-Apr-08 10:50 UTC
[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")
Hello, I am using distributed geo replication with two of my GlusterFS 3.7.20 replicated volumes and just noticed that the geo replication for one volume is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried to stop and restart geo replication but still it stays stuck at that specific date and time under the DATA field of the geo replication "status detail" command I can see 3879 and that it has "Active" as STATUS but still nothing happens. I noticed that the rsync process is running but does not do anything, then I did a strace on the PID of rsync and saw the following: write(2, "rsync: link_stat \"(unreachable)/"..., 114 It looks like rsync can't read or find a file and stays stuck on that. In the geo-replication log files of GlusterFS master I can't find any error messages just informational message. For example when I restart the geo replication I see the following log entries: [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] <top>: slave bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}] [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] <top>: worker specs: [('/data/private/brick', 'ssh://root at gfs1geo.domain:gluster://localhost:private-geo', '1', False)] [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor: ------------------------------------------------------------ [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor: starting gsyncd worker [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i] <top>: syncing: gluster://localhost:private -> ssh://[root at gfs1geo.domain](mailto:root at gfs1geo.domain.ch):gluster://localhost:private-geo [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__] ChangelogAgent: Agent listining... [2017-04-07 21:43:08.558648] I [master(/data/private/brick):83:gmaster_builder] <top>: setting up xsync change detection mode [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-04-07 21:43:08.560163] I [master(/data/private/brick):83:gmaster_builder] <top>: setting up changelog change detection mode [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-04-07 21:43:08.561105] I [master(/data/private/brick):83:gmaster_builder] <top>: setting up changeloghistory change detection mode [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__] _GMaster: using 'rsync' as the sync engine [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync [2017-04-07 21:43:11.354751] I [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time: 1491601391 [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap] _GMaster: primary master with volume id e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 ... [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap] _GMaster: crawl interval: 1 seconds [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl] _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0), etime: 1491601391 [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl] _GMaster: slave's time: (1487885974, 0) Does anyone know how I can find out the root cause of this problem and make geo replication work again from the time point it got stuck? Many thanks in advance for your help. Best regards, Mabi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170408/c634e66c/attachment.html>
Kotresh Hiremath Ravishankar
2017-Apr-10 04:33 UTC
[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")
Hi Mabi, What's the rsync version being used? Thanks and Regards, Kotresh H R ----- Original Message -----> From: "mabi" <mabi at protonmail.ch> > To: "Gluster Users" <gluster-users at gluster.org> > Sent: Saturday, April 8, 2017 4:20:25 PM > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)") > > Hello, > > I am using distributed geo replication with two of my GlusterFS 3.7.20 > replicated volumes and just noticed that the geo replication for one volume > is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried > to stop and restart geo replication but still it stays stuck at that > specific date and time under the DATA field of the geo replication "status > detail" command I can see 3879 and that it has "Active" as STATUS but still > nothing happens. I noticed that the rsync process is running but does not do > anything, then I did a strace on the PID of rsync and saw the following: > > write(2, "rsync: link_stat \"(unreachable)/"..., 114 > > It looks like rsync can't read or find a file and stays stuck on that. In the > geo-replication log files of GlusterFS master I can't find any error > messages just informational message. For example when I restart the geo > replication I see the following log entries: > > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] <top>: slave > bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}] > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] <top>: > worker specs: [('/data/private/brick', 'ssh:// root at gfs1geo.domain > :gluster://localhost:private-geo', '1', False)] > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor: > ------------------------------------------------------------ > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor: > starting gsyncd worker > [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i] > <top>: syncing: gluster://localhost:private -> ssh:// root at gfs1geo.domain > :gluster://localhost:private-geo > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__] > ChangelogAgent: Agent listining... > [2017-04-07 21:43:08.558648] I > [master(/data/private/brick):83:gmaster_builder] <top>: setting up xsync > change detection mode > [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__] > _GMaster: using 'rsync' as the sync engine > [2017-04-07 21:43:08.560163] I > [master(/data/private/brick):83:gmaster_builder] <top>: setting up changelog > change detection mode > [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__] > _GMaster: using 'rsync' as the sync engine > [2017-04-07 21:43:08.561105] I > [master(/data/private/brick):83:gmaster_builder] <top>: setting up > changeloghistory change detection mode > [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__] > _GMaster: using 'rsync' as the sync engine > [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register] > _GMaster: xsync temp directory: > /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync > [2017-04-07 21:43:11.354751] I > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time: > 1491601391 > [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap] > _GMaster: primary master with volume id e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 > ... > [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap] > _GMaster: crawl interval: 1 seconds > [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl] > _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0), etime: > 1491601391 > [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl] > _GMaster: slave's time: (1487885974, 0) > > Does anyone know how I can find out the root cause of this problem and make > geo replication work again from the time point it got stuck? > > Many thanks in advance for your help. > > Best regards, > Mabi > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users