thr3ads.net - Gluster users - [Gluster-users] Geo replication stuck (rsync: link

If this information is useful, please help other people find it:
Share via:

Kotresh Hiremath Ravishankar

2017-Apr-13 05:57 UTC

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

Hi,

I think the directory Workhours_2017 is deleted on master and on
slave it's failing to delete because there might be stale linkto files
at the back end. These issues are fixed in DHT with latest versions.
Upgrading to latest version would solve these issues.

To workaround the issue, you might need to cleanup the problematic
directory on slave from the backend.

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "mabi" <mabi at protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Users" <gluster-users at gluster.org>
> Sent: Thursday, April 13, 2017 12:28:50 AM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
"(unreachable)")
> 
> Hi Kotresh,
> 
> Thanks for your hint, adding the "--ignore-missing-args" option
to rsync and
> restarting geo-replication worked but it only managed to sync approximately
> 1/3 of the data until it put the geo replication in status
"Failed" this
> time. Now I have a different type of error as you can see below from the
log
> extract on my geo replication slave node:
> 
> [2017-04-12 18:01:55.268923] I [MSGID: 109066]
[dht-rename.c:1574:dht_rename]
> 0-myvol-private-geo-dht: renaming
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> empty.xls.ocTransferId2118183895.part
> (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0) =>
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls
> (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0)
> [2017-04-12 18:01:55.269842] W [fuse-bridge.c:1787:fuse_rename_cbk]
> 0-glusterfs-fuse: 4786:
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> empty.xls.ocTransferId2118183895.part ->
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls =>
-1
> (Directory not empty)
> [2017-04-12 18:01:55.314062] I [fuse-bridge.c:5016:fuse_thread_proc]
0-fuse:
> unmounting /tmp/gsyncd-aux-mount-PNSR8s
> [2017-04-12 18:01:55.314311] W [glusterfsd.c:1251:cleanup_and_exit]
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f97d3129064]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f97d438a725]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x7f97d438a5a7] ) 0-:
> received signum (15), shutting down
> [2017-04-12 18:01:55.314335] I [fuse-bridge.c:5720:fini] 0-fuse: Unmounting
> '/tmp/gsyncd-aux-mount-PNSR8s'.
> 
> How can I fix now this issue and have geo-replication continue
synchronising
> again?
> 
> Best regards,
> M.
> 
> -------- Original Message --------
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 11, 2017 9:18 AM
> UTC Time: April 11, 2017 7:18 AM
> From: khiremat at redhat.com
> To: mabi <mabi at protonmail.ch>
> Gluster Users <gluster-users at gluster.org>
> 
> Hi,
> 
> Then please use set the following rsync config and let us know if it helps.
> 
> gluster vol geo-rep <mastervol> <slavehost>::<slavevol>
config rsync-options
> "--ignore-missing-args"
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "mabi" <mabi at protonmail.ch>
> > To: "Kotresh Hiremath Ravishankar" <khiremat at
redhat.com>
> > Cc: "Gluster Users" <gluster-users at gluster.org>
> > Sent: Tuesday, April 11, 2017 2:15:54 AM
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hi Kotresh,
> >
> > I am using the official Debian 8 (jessie) package which has rsync
version
> > 3.1.1.
> >
> > Regards,
> > M.
> >
> > -------- Original Message --------
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> > Local Time: April 10, 2017 6:33 AM
> > UTC Time: April 10, 2017 4:33 AM
> > From: khiremat at redhat.com
> > To: mabi <mabi at protonmail.ch>
> > Gluster Users <gluster-users at gluster.org>
> >
> > Hi Mabi,
> >
> > What's the rsync version being used?
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> > > From: "mabi" <mabi at protonmail.ch>
> > > To: "Gluster Users" <gluster-users at
gluster.org>
> > > Sent: Saturday, April 8, 2017 4:20:25 PM
> > > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat
> > > "(unreachable)")
> > >
> > > Hello,
> > >
> > > I am using distributed geo replication with two of my GlusterFS
3.7.20
> > > replicated volumes and just noticed that the geo replication for
one
> > > volume
> > > is not working anymore. It is stuck since the 2017-02-23 22:39
and I
> > > tried
> > > to stop and restart geo replication but still it stays stuck at
that
> > > specific date and time under the DATA field of the geo
replication
> > > "status
> > > detail" command I can see 3879 and that it has
"Active" as STATUS but
> > > still
> > > nothing happens. I noticed that the rsync process is running but
does not
> > > do
> > > anything, then I did a strace on the PID of rsync and saw the
following:
> > >
> > > write(2, "rsync: link_stat \"(unreachable)/"...,
114
> > >
> > > It looks like rsync can't read or find a file and stays stuck
on that. In
> > > the
> > > geo-replication log files of GlusterFS master I can't find
any error
> > > messages just informational message. For example when I restart
the geo
> > > replication I see the following log entries:
> > >
> > > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute]
<top>:
> > > slave
> > > bricks: [{'host': 'gfs1geo.domain',
'dir': '/data/private-geo/brick'}]
> > > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute]
<top>:
> > > worker specs: [('/data/private/brick', 'ssh:// root
at gfs1geo.domain
> > > :gluster://localhost:private-geo', '1', False)]
> > > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor]
Monitor:
> > > starting gsyncd worker
> > > [2017-04-07 21:43:05.930124] I
[gsyncd(/data/private/brick):733:main_i]
> > > <top>: syncing: gluster://localhost:private -> ssh://
root at gfs1geo.domain
> > > :gluster://localhost:private-geo
> > > [2017-04-07 21:43:05.931169] I
[changelogagent(agent):73:__init__]
> > > ChangelogAgent: Agent listining...
> > > [2017-04-07 21:43:08.558648] I
> > > [master(/data/private/brick):83:gmaster_builder] <top>:
setting up xsync
> > > change detection mode
> > > [2017-04-07 21:43:08.559071] I
[master(/data/private/brick):367:__init__]
> > > _GMaster: using 'rsync' as the sync engine
> > > [2017-04-07 21:43:08.560163] I
> > > [master(/data/private/brick):83:gmaster_builder] <top>:
setting up
> > > changelog
> > > change detection mode
> > > [2017-04-07 21:43:08.560431] I
[master(/data/private/brick):367:__init__]
> > > _GMaster: using 'rsync' as the sync engine
> > > [2017-04-07 21:43:08.561105] I
> > > [master(/data/private/brick):83:gmaster_builder] <top>:
setting up
> > > changeloghistory change detection mode
> > > [2017-04-07 21:43:08.561391] I
[master(/data/private/brick):367:__init__]
> > > _GMaster: using 'rsync' as the sync engine
> > > [2017-04-07 21:43:11.354417] I
> > > [master(/data/private/brick):1249:register]
> > > _GMaster: xsync temp directory:
> > >
/var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> > > [2017-04-07 21:43:11.354751] I
> > > [resource(/data/private/brick):1528:service_loop] GLUSTER:
Register time:
> > > 1491601391
> > > [2017-04-07 21:43:11.357630] I
> > > [master(/data/private/brick):510:crawlwrap]
> > > _GMaster: primary master with volume id
> > > e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
> > > ...
> > > [2017-04-07 21:43:11.489355] I
> > > [master(/data/private/brick):519:crawlwrap]
> > > _GMaster: crawl interval: 1 seconds
> > > [2017-04-07 21:43:11.516710] I
[master(/data/private/brick):1163:crawl]
> > > _GMaster: starting history crawl... turns: 1, stime: (1487885974,
0),
> > > etime:
> > > 1491601391
> > > [2017-04-07 21:43:12.607836] I
[master(/data/private/brick):1192:crawl]
> > > _GMaster: slave's time: (1487885974, 0)
> > >
> > > Does anyone know how I can find out the root cause of this
problem and
> > > make
> > > geo replication work again from the time point it got stuck?
> > >
> > > Many thanks in advance for your help.
> > >
> > > Best regards,
> > > Mabi
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users

mabi

2017-Apr-13 15:21 UTC

head link

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

Hi Kotresh,

Thanks for your feedback.

So do you mean I can simply login into the geo-replication slave node, mount the
volume with fuse, and delete the problematic directory, and finally restart
geo-replcation?

I am planning to migrate to 3.8 as soon as I have a backup (geo-replication). Is
this issue with DHT fixed in the latest 3.8.x release?

Regards,
M.

-------- Original Message --------
Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
"(unreachable)")
Local Time: April 13, 2017 7:57 AM
UTC Time: April 13, 2017 5:57 AM
From: khiremat at redhat.com
To: mabi <mabi at protonmail.ch>
Gluster Users <gluster-users at gluster.org>

Hi,

I think the directory Workhours_2017 is deleted on master and on
slave it's failing to delete because there might be stale linkto files
at the back end. These issues are fixed in DHT with latest versions.
Upgrading to latest version would solve these issues.

To workaround the issue, you might need to cleanup the problematic
directory on slave from the backend.

Thanks and Regards,
Kotresh H R

----- Original Message -----> From: "mabi" <mabi at protonmail.ch>
> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Cc: "Gluster Users" <gluster-users at gluster.org>
> Sent: Thursday, April 13, 2017 12:28:50 AM
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
"(unreachable)")
>
> Hi Kotresh,
>
> Thanks for your hint, adding the "--ignore-missing-args" option
to rsync and
> restarting geo-replication worked but it only managed to sync approximately
> 1/3 of the data until it put the geo replication in status
"Failed" this
> time. Now I have a different type of error as you can see below from the
log
> extract on my geo replication slave node:
>
> [2017-04-12 18:01:55.268923] I [MSGID: 109066]
[dht-rename.c:1574:dht_rename]
> 0-myvol-private-geo-dht: renaming
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> empty.xls.ocTransferId2118183895.part
> (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0) =>
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls
> (hash=myvol-private-geo-client-0/cache=myvol-private-geo-client-0)
> [2017-04-12 18:01:55.269842] W [fuse-bridge.c:1787:fuse_rename_cbk]
> 0-glusterfs-fuse: 4786:
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017
> empty.xls.ocTransferId2118183895.part ->
> /.gfid/1678ff37-f708-4197-bed0-3ecd87ae1314/Workhours_2017 empty.xls =>
-1
> (Directory not empty)
> [2017-04-12 18:01:55.314062] I [fuse-bridge.c:5016:fuse_thread_proc]
0-fuse:
> unmounting /tmp/gsyncd-aux-mount-PNSR8s
> [2017-04-12 18:01:55.314311] W [glusterfsd.c:1251:cleanup_and_exit]
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f97d3129064]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f97d438a725]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x7f97d438a5a7] ) 0-:
> received signum (15), shutting down
> [2017-04-12 18:01:55.314335] I [fuse-bridge.c:5720:fini] 0-fuse: Unmounting
> '/tmp/gsyncd-aux-mount-PNSR8s'.
>
> How can I fix now this issue and have geo-replication continue
synchronising
> again?
>
> Best regards,
> M.
>
> -------- Original Message --------
> Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> "(unreachable)")
> Local Time: April 11, 2017 9:18 AM
> UTC Time: April 11, 2017 7:18 AM
> From: khiremat at redhat.com
> To: mabi <mabi at protonmail.ch>
> Gluster Users <gluster-users at gluster.org>
>
> Hi,
>
> Then please use set the following rsync config and let us know if it helps.
>
> gluster vol geo-rep <mastervol> <slavehost>::<slavevol>
config rsync-options
> "--ignore-missing-args"
>
> Thanks and Regards,
> Kotresh H R
>
> ----- Original Message -----
> > From: "mabi" <mabi at protonmail.ch>
> > To: "Kotresh Hiremath Ravishankar" <khiremat at
redhat.com>
> > Cc: "Gluster Users" <gluster-users at gluster.org>
> > Sent: Tuesday, April 11, 2017 2:15:54 AM
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> >
> > Hi Kotresh,
> >
> > I am using the official Debian 8 (jessie) package which has rsync
version
> > 3.1.1.
> >
> > Regards,
> > M.
> >
> > -------- Original Message --------
> > Subject: Re: [Gluster-users] Geo replication stuck (rsync: link_stat
> > "(unreachable)")
> > Local Time: April 10, 2017 6:33 AM
> > UTC Time: April 10, 2017 4:33 AM
> > From: khiremat at redhat.com
> > To: mabi <mabi at protonmail.ch>
> > Gluster Users <gluster-users at gluster.org>
> >
> > Hi Mabi,
> >
> > What's the rsync version being used?
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> > > From: "mabi" <mabi at protonmail.ch>
> > > To: "Gluster Users" <gluster-users at
gluster.org>
> > > Sent: Saturday, April 8, 2017 4:20:25 PM
> > > Subject: [Gluster-users] Geo replication stuck (rsync: link_stat
> > > "(unreachable)")
> > >
> > > Hello,
> > >
> > > I am using distributed geo replication with two of my GlusterFS
3.7.20
> > > replicated volumes and just noticed that the geo replication for
one
> > > volume
> > > is not working anymore. It is stuck since the 2017-02-23 22:39
and I
> > > tried
> > > to stop and restart geo replication but still it stays stuck at
that
> > > specific date and time under the DATA field of the geo
replication
> > > "status
> > > detail" command I can see 3879 and that it has
"Active" as STATUS but
> > > still
> > > nothing happens. I noticed that the rsync process is running but
does not
> > > do
> > > anything, then I did a strace on the PID of rsync and saw the
following:
> > >
> > > write(2, "rsync: link_stat \"(unreachable)/"...,
114
> > >
> > > It looks like rsync can't read or find a file and stays stuck
on that. In
> > > the
> > > geo-replication log files of GlusterFS master I can't find
any error
> > > messages just informational message. For example when I restart
the geo
> > > replication I see the following log entries:
> > >
> > > [2017-04-07 21:43:05.664541] I [monitor(monitor):443:distribute]
<top>:
> > > slave
> > > bricks: [{'host': 'gfs1geo.domain',
'dir': '/data/private-geo/brick'}]
> > > [2017-04-07 21:43:05.666435] I [monitor(monitor):468:distribute]
<top>:
> > > worker specs: [('/data/private/brick', 'ssh:// root
at gfs1geo.domain
> > > :gluster://localhost:private-geo', '1', False)]
> > > [2017-04-07 21:43:05.823931] I [monitor(monitor):267:monitor]
Monitor:
> > > ------------------------------------------------------------
> > > [2017-04-07 21:43:05.824204] I [monitor(monitor):268:monitor]
Monitor:
> > > starting gsyncd worker
> > > [2017-04-07 21:43:05.930124] I
[gsyncd(/data/private/brick):733:main_i]
> > > <top>: syncing: gluster://localhost:private -> ssh://
root at gfs1geo.domain
> > > :gluster://localhost:private-geo
> > > [2017-04-07 21:43:05.931169] I
[changelogagent(agent):73:__init__]
> > > ChangelogAgent: Agent listining...
> > > [2017-04-07 21:43:08.558648] I
> > > [master(/data/private/brick):83:gmaster_builder] <top>:
setting up xsync
> > > change detection mode
> > > [2017-04-07 21:43:08.559071] I
[master(/data/private/brick):367:__init__]
> > > _GMaster: using 'rsync' as the sync engine
> > > [2017-04-07 21:43:08.560163] I
> > > [master(/data/private/brick):83:gmaster_builder] <top>:
setting up
> > > changelog
> > > change detection mode
> > > [2017-04-07 21:43:08.560431] I
[master(/data/private/brick):367:__init__]
> > > _GMaster: using 'rsync' as the sync engine
> > > [2017-04-07 21:43:08.561105] I
> > > [master(/data/private/brick):83:gmaster_builder] <top>:
setting up
> > > changeloghistory change detection mode
> > > [2017-04-07 21:43:08.561391] I
[master(/data/private/brick):367:__init__]
> > > _GMaster: using 'rsync' as the sync engine
> > > [2017-04-07 21:43:11.354417] I
> > > [master(/data/private/brick):1249:register]
> > > _GMaster: xsync temp directory:
> > >
/var/lib/misc/glusterfsd/private/ssh%3A%2F%2Froot%40192.168.20.107%3Agluster%3A%2F%2F127.0.0.1%3Aprivate-geo/616931ac8f39da5dc5834f9d47fc7b1a/xsync
> > > [2017-04-07 21:43:11.354751] I
> > > [resource(/data/private/brick):1528:service_loop] GLUSTER:
Register time:
> > > 1491601391
> > > [2017-04-07 21:43:11.357630] I
> > > [master(/data/private/brick):510:crawlwrap]
> > > _GMaster: primary master with volume id
> > > e7a40a1b-45c9-4d3c-bb19-0c59b4eceec5
> > > ...
> > > [2017-04-07 21:43:11.489355] I
> > > [master(/data/private/brick):519:crawlwrap]
> > > _GMaster: crawl interval: 1 seconds
> > > [2017-04-07 21:43:11.516710] I
[master(/data/private/brick):1163:crawl]
> > > _GMaster: starting history crawl... turns: 1, stime: (1487885974,
0),
> > > etime:
> > > 1491601391
> > > [2017-04-07 21:43:12.607836] I
[master(/data/private/brick):1192:crawl]
> > > _GMaster: slave's time: (1487885974, 0)
> > >
> > > Does anyone know how I can find out the root cause of this
problem and
> > > make
> > > geo replication work again from the time point it got stuck?
> > >
> > > Many thanks in advance for your help.
> > >
> > > Best regards,
> > > Mabi
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170413/bcab1c6f/attachment.html>

Gluster users - Apr 2017 - Geo replication stuck (rsync: link_stat "(unreachable)")

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")

[Gluster-users] Geo replication stuck (rsync: link_stat "(unreachable)")