thr3ads.net - Gluster users - [Gluster-users] Geo replication stuck (rsync: link

If this information is useful, please help other people find it:
Share via:

Kotresh Hiremath Ravishankar

2017-Apr-11 07:18 UTC

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

Hi,

Then please use set the following rsync config and let us know if it helps.

gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config
rsync-options "--ignore-missing-args"

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "mabi" <mabi at protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Users" <gluster-users at gluster.org>
> Sent: Tuesday, April 11, 2017 2:15:54 AM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
"(unreachable)")
> 
> Hi Kotresh,
> 
> I am using the official Debian 8 (jessie) package which has rsync version
> 3.1.1.
> 
> Regards,
> M.
> 
> -------- Original Message --------
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 10, 2017 6:33 AM
> UTC Time: April 10, 2017 4:33 AM
> From: khiremat at redhat.com
> To: mabi <mabi at protonmail.ch>
> Gluster Users <gluster-users at gluster.org>
> 
> Hi Mabi,
> 
> What's the rsync version being used?
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "mabi" <mabi at protonmail.ch>
> > To: "Gluster Users" <gluster-users at gluster.org>
> > Sent: Saturday, April 8, 2017 4:20:25 PM
> > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hello,
> >
> > I am using distributed geo replication with two of my GlusterFS 3.7.20
> > replicated volumes and just noticed that the geo replication for one
volume
> > is not working anymore. It is stuck since the 2017-02-23 22:39 and I
tried
> > to stop and restart geo replication but still it stays stuck at that
> > specific date and time under the DATA field of the geo replication
"status
> > detail" command I can see 3879 and that it has "Active"
as STATUS but still
> > nothing happens. I noticed that the rsync process is running but does
not
> > do
> > anything, then I did a strace on the PID of rsync and saw the
following:
> >
> > write(2, "rsync: link_stat \"(unreachable)/"..., 114
> >
> > It looks like rsync can't read or find a file and stays stuck on
that. In
> > the
> > geo-replication log files of GlusterFS master I can't find any
error
> > messages just informational message. For example when I restart the
geo
> > replication I see the following log entries:
> >
> > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute]
<top>:
> > slave
> > bricks: [{'host': 'gfs1geo.domain', 'dir':
'/data/private-geo/brick'}]
> > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute]
<top>:
> > worker specs: [('/data/private/brick', 'ssh:// root at
gfs1geo.domain
> > :gluster://localhost:private-geo', '1', False)]
> > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor:
> > ------------------------------------------------------------
> > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor:
> > starting gsyncd worker
> > [2017-04-07 21:43:05.930124] I
[gsyncd(/data/private/brick):733:main_i]
> > <top>: syncing: gluster://localhost:private -> ssh:// root at
gfs1geo.domain
> > :gluster://localhost:private-geo
> > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__]
> > ChangelogAgent: Agent listining...
> > [2017-04-07 21:43:08.558648] I
> > [master(/data/private/brick):83:gmaster_builder] <top>: setting
up xsync
> > change detection mode
> > [2017-04-07 21:43:08.559071] I
[master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:08.560163] I
> > [master(/data/private/brick):83:gmaster_builder] <top>: setting
up
> > changelog
> > change detection mode
> > [2017-04-07 21:43:08.560431] I
[master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:08.561105] I
> > [master(/data/private/brick):83:gmaster_builder] <top>: setting
up
> > changeloghistory change detection mode
> > [2017-04-07 21:43:08.561391] I
[master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:11.354417] I
[master(/data/private/brick):1249:register]
> > _GMaster: xsync temp directory:
> >
/var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> > [2017-04-07 21:43:11.354751] I
> > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register
time:
> > 1491601391
> > [2017-04-07 21:43:11.357630] I
[master(/data/private/brick):510:crawlwrap]
> > _GMaster: primary master with volume id
> > e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
> > ...
> > [2017-04-07 21:43:11.489355] I
[master(/data/private/brick):519:crawlwrap]
> > _GMaster: crawl interval: 1 seconds
> > [2017-04-07 21:43:11.516710] I
[master(/data/private/brick):1163:crawl]
> > _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0),
> > etime:
> > 1491601391
> > [2017-04-07 21:43:12.607836] I
[master(/data/private/brick):1192:crawl]
> > _GMaster: slave's time: (1487885974, 0)
> >
> > Does anyone know how I can find out the root cause of this problem and
make
> > geo replication work again from the time point it got stuck?
> >
> > Many thanks in advance for your help.
> >
> > Best regards,
> > Mabi
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users

mabi

2017-Apr-12 18:58 UTC

head link

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

Hi Kotresh,

Thanks for your hint, adding the "--ignore-missing-args" option to
rsync and restarting geo-replication worked but it only managed to sync
approximately 1/3 of the data until it put the geo replication in status
"Failed" this time. Now I have a different type of error as you can
see below from the log extract on my geo replication slave node:

[2017-04-12 18:01:55.268923] I [MSGID: 109066] [dht-rename.c:1574:dht_rename]
0-myvol-private-geo-dht: renaming
/.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
empty.xls.ocTransferId2118183895.part
(hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0) =>
/.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls
(hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0)
[2017-04-12 18:01:55.269842] W [fuse-bridge.c:1787:fuse_rename_cbk]
0-glusterfs-fuse: 4786:
/.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
empty.xls.ocTransferId2118183895.part ->
/.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls => -1
(Directory not empty)
[2017-04-12 18:01:55.314062] I [fuse-bridge.c:5016:fuse_thread_proc] 0-fuse:
unmounting /tmp/gsyncd-aux-mount-PNSR8s
[2017-04-12 18:01:55.314311] W [glusterfsd.c:1251:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f97d3129064]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f97d438a725]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x7f97d438a5a7] ) 0-: received
signum (15), shutting down
[2017-04-12 18:01:55.314335] I [fuse-bridge.c:5720:fini] 0-fuse: Unmounting
'/tmp/gsyncd-aux-mount-PNSR8s'.

How can I fix now this issue and have geo-replication continue synchronising
again?

Best regards,
M.

-------- Original Message --------
Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
"(unreachable)")
Local Time: April 11, 2017 9:18 AM
UTC Time: April 11, 2017 7:18 AM
From: khiremat at redhat.com
To: mabi <mabi at protonmail.ch>
Gluster Users <gluster-users at gluster.org>

Hi,

Then please use set the following rsync config and let us know if it helps.

gluster vol geo-rep <mastervol> <slavehost>::<slavevol> config
rsync-options "--ignore-missing-args"

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "mabi" <mabi at protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Users" <gluster-users at gluster.org>
> Sent: Tuesday, April 11, 2017 2:15:54 AM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
"(unreachable)")
>
> Hi Kotresh,
>
> I am using the official Debian 8 (jessie) package which has rsync version
> 3.1.1.
>
> Regards,
> M.
>
> -------- Original Message --------
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 10, 2017 6:33 AM
> UTC Time: April 10, 2017 4:33 AM
> From: khiremat at redhat.com
> To: mabi <mabi at protonmail.ch>
> Gluster Users <gluster-users at gluster.org>
>
> Hi Mabi,
>
> What's the rsync version being used?
>
> Thanks and Regards,
> Kotresh H R
>
> ----- Original Message -----
> > From: "mabi" <mabi at protonmail.ch>
> > To: "Gluster Users" <gluster-users at gluster.org>
> > Sent: Saturday, April 8, 2017 4:20:25 PM
> > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hello,
> >
> > I am using distributed geo replication with two of my GlusterFS 3.7.20
> > replicated volumes and just noticed that the geo replication for one
volume
> > is not working anymore. It is stuck since the 2017-02-23 22:39 and I
tried
> > to stop and restart geo replication but still it stays stuck at that
> > specific date and time under the DATA field of the geo replication
"status
> > detail" command I can see 3879 and that it has "Active"
as STATUS but still
> > nothing happens. I noticed that the rsync process is running but does
not
> > do
> > anything, then I did a strace on the PID of rsync and saw the
following:
> >
> > write(2, "rsync: link_stat \"(unreachable)/"..., 114
> >
> > It looks like rsync can't read or find a file and stays stuck on
that. In
> > the
> > geo-replication log files of GlusterFS master I can't find any
error
> > messages just informational message. For example when I restart the
geo
> > replication I see the following log entries:
> >
> > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute]
<top>:
> > slave
> > bricks: [{'host': 'gfs1geo.domain', 'dir':
'/data/private-geo/brick'}]
> > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute]
<top>:
> > worker specs: [('/data/private/brick', 'ssh:// root at
gfs1geo.domain
> > :gluster://localhost:private-geo', '1', False)]
> > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor] Monitor:
> > ------------------------------------------------------------
> > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor] Monitor:
> > starting gsyncd worker
> > [2017-04-07 21:43:05.930124] I
[gsyncd(/data/private/brick):733:main_i]
> > <top>: syncing: gluster://localhost:private -> ssh:// root at
gfs1geo.domain
> > :gluster://localhost:private-geo
> > [2017-04-07 21:43:05.931169] I [changelogagent(agent):73:__init__]
> > ChangelogAgent: Agent listining...
> > [2017-04-07 21:43:08.558648] I
> > [master(/data/private/brick):83:gmaster_builder] <top>: setting
up xsync
> > change detection mode
> > [2017-04-07 21:43:08.559071] I
[master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:08.560163] I
> > [master(/data/private/brick):83:gmaster_builder] <top>: setting
up
> > changelog
> > change detection mode
> > [2017-04-07 21:43:08.560431] I
[master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:08.561105] I
> > [master(/data/private/brick):83:gmaster_builder] <top>: setting
up
> > changeloghistory change detection mode
> > [2017-04-07 21:43:08.561391] I
[master(/data/private/brick):367:__init__]
> > _GMaster: using 'rsync' as the sync engine
> > [2017-04-07 21:43:11.354417] I
[master(/data/private/brick):1249:register]
> > _GMaster: xsync temp directory:
> >
/var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> > [2017-04-07 21:43:11.354751] I
> > [resource(/data/private/brick):1528:service_loop] GLUSTER: Register
time:
> > 1491601391
> > [2017-04-07 21:43:11.357630] I
[master(/data/private/brick):510:crawlwrap]
> > _GMaster: primary master with volume id
> > e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
> > ...
> > [2017-04-07 21:43:11.489355] I
[master(/data/private/brick):519:crawlwrap]
> > _GMaster: crawl interval: 1 seconds
> > [2017-04-07 21:43:11.516710] I
[master(/data/private/brick):1163:crawl]
> > _GMaster: starting history crawl... turns: 1, stime: (1487885974, 0),
> > etime:
> > 1491601391
> > [2017-04-07 21:43:12.607836] I
[master(/data/private/brick):1192:crawl]
> > _GMaster: slave's time: (1487885974, 0)
> >
> > Does anyone know how I can find out the root cause of this problem and
make
> > geo replication work again from the time point it got stuck?
> >
> > Many thanks in advance for your help.
> >
> > Best regards,
> > Mabi
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170412/ddaff380/attachment.html>

Gluster users - Apr 2017 - Geo replication stuck (rsync: link_stat "(unreachable)")

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")