mabi
2017-Apr-10 20:45 UTC
[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")
Hi Kotresh, I am using the official Debian 8 (jessie) package which has rsync version 3.1.1. Regards, M. -------- Original Message -------- Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)") Local Time: April 10, 2017 6:33 AM UTC Time: April 10, 2017 4:33 AM From: khiremat at redhat.com To: mabi <mabi at protonmail.ch> Gluster Users <gluster-users at gluster.org> Hi Mabi, What's the rsync version being used? Thanks and Regards, Kotresh H R ----- Original Message -----> From: "mabi" <mabi at protonmail.ch> > To: "Gluster Users" <gluster-users at gluster.org> > Sent: Saturday, April 8, 2017 4:20:25 PM > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)") > > Hello, > > I am using distributed geo replication with two of my GlusterFS 3.7.20 > replicated volumes and just noticed that the geo replication for one volume > is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried > to stop and restart geo replication but still it stays stuck at that > specific date and time under the DATA field of the geo replication "status > detail" command I can see 3879 and that it has "Active" as STATUS but still > nothing happens. I noticed that the rsync process is running but does not do > anything, then I did a strace on the PID of rsync and saw the following: > > write(2, "rsync: link_stat \"(unreachable)/"..., 114 > > It looks like rsync can't read or find a file and stays stuck on that. In the > geo-replication log files of GlusterFS master I can't find any error > messages just informational message. For example when I restart the geo > replication I see the following log entries: > > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] <top>: slave > bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}] > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] <top>: > worker specs: [('/data/private/brick', 'ssh:// root at gfs1geo.domain > :gluster://localhost:private-geo', '1', False)] > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor: > ------------------------------------------------------------ > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor: > starting gsyncd worker > [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i] > <top>: syncing: gluster://localhost:private -> ssh:// root at gfs1geo.domain > :gluster://localhost:private-geo > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__] > ChangelogAgent: Agent listining... > [2017-04-07 21:43:08.558648] I > [master(/data/private/brick):83:gmaster_builder] <top>: setting up xsync > change detection mode > [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__] > _GMaster: using 'rsync' as the sync engine > [2017-04-07 21:43:08.560163] I > [master(/data/private/brick):83:gmaster_builder] <top>: setting up changelog > change detection mode > [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__] > _GMaster: using 'rsync' as the sync engine > [2017-04-07 21:43:08.561105] I > [master(/data/private/brick):83:gmaster_builder] <top>: setting up > changeloghistory change detection mode > [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__] > _GMaster: using 'rsync' as the sync engine > [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register] > _GMaster: xsync temp directory: > /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync > [2017-04-07 21:43:11.354751] I > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time: > 1491601391 > [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap] > _GMaster: primary master with volume id e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 > ... > [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap] > _GMaster: crawl interval: 1 seconds > [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl] > _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0), etime: > 1491601391 > [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl] > _GMaster: slave's time: (1487885974, 0) > > Does anyone know how I can find out the root cause of this problem and make > geo replication work again from the time point it got stuck? > > Many thanks in advance for your help. > > Best regards, > Mabi > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170410/da9bfc46/attachment.html>
Kotresh Hiremath Ravishankar
2017-Apr-11 07:18 UTC
[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")
Hi, Then please use set the following rsync config and let us know if it helps. gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config rsync-options "--ignore-missing-args" Thanks and Regards, Kotresh H R ----- Original Message -----> From: "mabi" <mabi at protonmail.ch> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > Cc: "Gluster Users" <gluster-users at gluster.org> > Sent: Tuesday, April 11, 2017 2:15:54 AM > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)") > > Hi Kotresh, > > I am using the official Debian 8 (jessie) package which has rsync version > 3.1.1. > > Regards, > M. > > -------- Original Message -------- > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat > "(unreachable)") > Local Time: April 10, 2017 6:33 AM > UTC Time: April 10, 2017 4:33 AM > From: khiremat at redhat.com > To: mabi <mabi at protonmail.ch> > Gluster Users <gluster-users at gluster.org> > > Hi Mabi, > > What's the rsync version being used? > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "mabi" <mabi at protonmail.ch> > > To: "Gluster Users" <gluster-users at gluster.org> > > Sent: Saturday, April 8, 2017 4:20:25 PM > > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat > > "(unreachable)") > > > > Hello, > > > > I am using distributed geo replication with two of my GlusterFS 3.7.20 > > replicated volumes and just noticed that the geo replication for one volume > > is not working anymore. It is stuck since the 2017-02-23 22:39 and I tried > > to stop and restart geo replication but still it stays stuck at that > > specific date and time under the DATA field of the geo replication "status > > detail" command I can see 3879 and that it has "Active" as STATUS but still > > nothing happens. I noticed that the rsync process is running but does not > > do > > anything, then I did a strace on the PID of rsync and saw the following: > > > > write(2, "rsync: link_stat \"(unreachable)/"..., 114 > > > > It looks like rsync can't read or find a file and stays stuck on that. In > > the > > geo-replication log files of GlusterFS master I can't find any error > > messages just informational message. For example when I restart the geo > > replication I see the following log entries: > > > > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute] <top>: > > slave > > bricks: [{'host': 'gfs1geo.domain', 'dir': '/data/private-geo/brick'}] > > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute] <top>: > > worker specs: [('/data/private/brick', 'ssh:// root at gfs1geo.domain > > :gluster://localhost:private-geo', '1', False)] > > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor: > > ------------------------------------------------------------ > > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor: > > starting gsyncd worker > > [2017-04-07 21:43:05.930124] I [gsyncd(/data/private/brick):733:main_i] > > <top>: syncing: gluster://localhost:private -> ssh:// root at gfs1geo.domain > > :gluster://localhost:private-geo > > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__] > > ChangelogAgent: Agent listining... > > [2017-04-07 21:43:08.558648] I > > [master(/data/private/brick):83:gmaster_builder] <top>: setting up xsync > > change detection mode > > [2017-04-07 21:43:08.559071] I [master(/data/private/brick):367:__init__] > > _GMaster: using 'rsync' as the sync engine > > [2017-04-07 21:43:08.560163] I > > [master(/data/private/brick):83:gmaster_builder] <top>: setting up > > changelog > > change detection mode > > [2017-04-07 21:43:08.560431] I [master(/data/private/brick):367:__init__] > > _GMaster: using 'rsync' as the sync engine > > [2017-04-07 21:43:08.561105] I > > [master(/data/private/brick):83:gmaster_builder] <top>: setting up > > changeloghistory change detection mode > > [2017-04-07 21:43:08.561391] I [master(/data/private/brick):367:__init__] > > _GMaster: using 'rsync' as the sync engine > > [2017-04-07 21:43:11.354417] I [master(/data/private/brick):1249:register] > > _GMaster: xsync temp directory: > > /var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync > > [2017-04-07 21:43:11.354751] I > > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register time: > > 1491601391 > > [2017-04-07 21:43:11.357630] I [master(/data/private/brick):510:crawlwrap] > > _GMaster: primary master with volume id > > e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5 > > ... > > [2017-04-07 21:43:11.489355] I [master(/data/private/brick):519:crawlwrap] > > _GMaster: crawl interval: 1 seconds > > [2017-04-07 21:43:11.516710] I [master(/data/private/brick):1163:crawl] > > _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0), > > etime: > > 1491601391 > > [2017-04-07 21:43:12.607836] I [master(/data/private/brick):1192:crawl] > > _GMaster: slave's time: (1487885974, 0) > > > > Does anyone know how I can find out the root cause of this problem and make > > geo replication work again from the time point it got stuck? > > > > Many thanks in advance for your help. > > > > Best regards, > > Mabi > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users