thr3ads.net - Gluster users - [Gluster-users] glusterfs geo-replication [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Thibault Godouet

2015-Aug-26 11:58 UTC

[Gluster-users] glusterfs geo-replication

Hi,

Given that this is an 'OSError', I should probably have said we use
Redhat
6.6 64bit and XFS as the brick filesystem.

Has anyone any input on how to troubleshoot this?

Thanks,
Thibault.
On 24 Aug 2015 4:03 pm, "Thibault Godouet" <Thibault.Godouet at
gresearch.co.uk>
wrote:
> I have had multiple issues with geo-replication.  It seems to work OK
> initially, the replica gets up to date, and not long after (e.g. a couple
> of days), the replication goes into a faulty state and won't get out of
it.
>
>
> I have tried a few times now, and last attempt I re-created the slave
> volume and setup the replication again.  Same symptoms again.
>
>
>
> I use Gluster 3.7.3, and you will find my setup and log messages at the
> bottom of the email.
>
>
> Any idea what could cause this and how to fix it?
>
> Thanks,
> Thibault.
>
> ps: my setup and log messages:
>
>
>
> Master:
>
>
>
> Volume Name: home
> Type: Replicate
> Volume ID: 2299a204-a1dc-449d-8556-bc65197373c7
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: server4.uberit.net:/gluster/home-brick-1
> Brick2: server5.uberit.net:/gluster/home-brick-1
> Options Reconfigured:
> performance.readdir-ahead: on
> geo-replication.indexing: on
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
>
>
>
> Slave:
>
> Volume Name: homegs
> Type: Distribute
> Volume ID: 746dfdc3-650d-4468-9fdd-d621dd215b94
> Status: Started
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: remoteserver1.uberit.net:/gluster/homegs-brick-1/brick
> Options Reconfigured:
> performance.readdir-ahead: on
>
>
>
> The geo-replication status and config (I think I ended up with only
> defaults values) are:
>
>
>
> # gluster volume geo-replication home ssh://remoteserver1::homegs status
>
> MASTER NODE       MASTER VOL    MASTER BRICK             SLAVE USER
> SLAVE                           SLAVE NODE    STATUS    CRAWL STATUS
> LAST_SYNCED
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
> server5          home          /gluster/home-brick-1    root
> ssh://remoteserver1::homegs      N/A           Faulty    N/A            
N/A
> server4          home          /gluster/home-brick-1    root
> ssh://remoteserver1::homegs      N/A           Faulty    N/A            
N/A
>
>
>
> # gluster volume geo-replication home ssh://remoteserver1::homegs config
> special_sync_mode: partial
> state_socket_unencoded:
>
/var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.socket
> gluster_log_file:
>
/var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.gluster.log
> ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem
> ignore_deletes: false
> change_detector: changelog
> gluster_command_dir: /usr/sbin/
> georep_session_working_dir:
> /var/lib/glusterd/geo-replication/home_remoteserver1_homegs/
> state_file:
>
/var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.status
> remote_gsyncd: /nonexistent/gsyncd
> session_owner: 2299a204-a1dc-449d-8556-bc65197373c7
> changelog_log_file:
>
/var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs-changes.log
> socketdir: /var/run/gluster
> working_dir:
>
/var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs
> state_detail_file:
>
/var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs-detail.status
> ssh_command_tar: ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem
> pid_file:
>
/var/lib/glusterd/geo-replication/home_remoteserver_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.pid
> log_file:
>
/var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.log
> gluster_params: aux-gfid-mount acl
> volume_id: 2299a204-a1dc-449d-8556-bc65197373c7
>
>
>
> The logs look like on the master on server1:
>
>
>
> [2015-08-24 15:21:07.955600] I [monitor(monitor):221:monitor] Monitor:
> ------------------------------------------------------------
> [2015-08-24 15:21:07.955883] I [monitor(monitor):222:monitor] Monitor:
> starting gsyncd worker
> [2015-08-24 15:21:08.69528] I [gsyncd(/gluster/home-brick-1):649:main_i]
> <top>: syncing: gluster://localhost:home ->
> ssh://root at vivlinuxinfra1.uberit.net:gluster://localhost:homegs
> [2015-08-24 15:21:08.70938] I [changelogagent(agent):75:__init__]
> ChangelogAgent: Agent listining...
> [2015-08-24 15:21:11.255237] I
> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up
xsync
> change detection mode
> [2015-08-24 15:21:11.255532] I
> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
> sync engine
> [2015-08-24 15:21:11.256570] I
> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up
> changelog change detection mode
> [2015-08-24 15:21:11.256726] I
> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
> sync engine
> [2015-08-24 15:21:11.257345] I
> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up
> changeloghistory change detection mode
> [2015-08-24 15:21:11.257534] I
> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
> sync engine
> [2015-08-24 15:21:13.333628] I
> [master(/gluster/home-brick-1):1212:register] _GMaster: xsync temp
> directory:
>
/var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40172.18.0.169%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs/62d98e8cc00a34eb85b4fe6d6fd3ba33/xsync
> [2015-08-24 15:21:13.333870] I
> [resource(/gluster/home-brick-1):1432:service_loop] GLUSTER: Register time:
> 1440426073
> [2015-08-24 15:21:13.401132] I
> [master(/gluster/home-brick-1):523:crawlwrap] _GMaster: primary master with
> volume id 2299a204-a1dc-449d-8556-bc65197373c7 ...
> [2015-08-24 15:21:13.412795] I
> [master(/gluster/home-brick-1):532:crawlwrap] _GMaster: crawl interval: 1
> seconds
> [2015-08-24 15:21:13.427340] I [master(/gluster/home-brick-1):1127:crawl]
> _GMaster: starting history crawl... turns: 1, stime: (1440411353, 0)
> [2015-08-24 15:21:14.432327] I [master(/gluster/home-brick-1):1156:crawl]
> _GMaster: slave's time: (1440411353, 0)
> [2015-08-24 15:21:14.890889] E [repce(/gluster/home-brick-1):207:__call__]
> RepceClient: call 20960:140215190427392:1440426074.56 (entry_ops) failed on
> peer with OSError
> [2015-08-24 15:21:14.891124] E
> [syncdutils(/gluster/home-brick-1):276:log_raise_exception] <top>:
FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
165, in
> main
>     main_i()
>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line
659, in
> main_i
>     local.service_loop(*[r for r in [remote] if r])
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py",
line 1438,
> in service_loop
>     g3.crawlwrap(oneshot=True)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
584, in
> crawlwrap
>     self.crawl()
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1165, in
> crawl
>     self.changelogs_batch_process(changes)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
1074, in
> changelogs_batch_process
>     self.process(batch)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
952, in
> process
>     self.process_change(change, done, retry)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
907, in
> process_change
>     failures = self.slave.server.entry_ops(entries)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
226, in
> __call__
>     return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
208, in
> __call__
>     raise res
> OSError: [Errno 5] Input/output error
> [2015-08-24 15:21:14.892291] I
> [syncdutils(/gluster/home-brick-1):220:finalize] <top>: exiting.
> [2015-08-24 15:21:14.893665] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2015-08-24 15:21:14.893879] I [syncdutils(agent):220:finalize]
<top>:
> exiting.
> [2015-08-24 15:21:15.259360] I [monitor(monitor):282:monitor] Monitor:
> worker(/gluster/home-brick-1) died in startup phase
>
>
>
> and on master server2:
>
>
>
> [2015-08-24 15:21:07.650707] I [monitor(monitor):221:monitor] Monitor:
> ------------------------------------------------------------
> [2015-08-24 15:21:07.651144] I [monitor(monitor):222:monitor] Monitor:
> starting gsyncd worker
> [2015-08-24 15:21:07.764817] I [gsyncd(/gluster/home-brick-1):649:main_i]
> <top>: syncing: gluster://localhost:home ->
> ssh://root at remoteserver1.uberit.net:gluster://localhost:homegs
> [2015-08-24 15:21:07.768552] I [changelogagent(agent):75:__init__]
> ChangelogAgent: Agent listining...
> [2015-08-24 15:21:11.9820] I
> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up
xsync
> change detection mode
> [2015-08-24 15:21:11.10199] I [master(/gluster/home-brick-1):404:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2015-08-24 15:21:11.10946] I
> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up
> changelog change detection mode
> [2015-08-24 15:21:11.11115] I [master(/gluster/home-brick-1):404:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2015-08-24 15:21:11.11744] I
> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up
> changeloghistory change detection mode
> [2015-08-24 15:21:11.11933] I [master(/gluster/home-brick-1):404:__init__]
> _GMaster: using 'rsync' as the sync engine
> [2015-08-24 15:21:13.59192] I
> [master(/gluster/home-brick-1):1212:register] _GMaster: xsync temp
> directory:
>
/var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs/62d98e8cc00a34eb85b4fe6d6fd3ba33/xsync
> [2015-08-24 15:21:13.59454] I
> [resource(/gluster/home-brick-1):1432:service_loop] GLUSTER: Register time:
> 1440426073
> [2015-08-24 15:21:13.113203] I
> [master(/gluster/home-brick-1):523:crawlwrap] _GMaster: primary master with
> volume id 2299a204-a1dc-449d-8556-bc65197373c7 ...
> [2015-08-24 15:21:13.122018] I
> [master(/gluster/home-brick-1):532:crawlwrap] _GMaster: crawl interval: 1
> seconds
> [2015-08-24 15:21:23.209912] E [repce(/gluster/home-brick-1):207:__call__]
> RepceClient: call 1561:140164806457088:1440426083.11 (keep_alive) failed on
> peer with OSError
> [2015-08-24 15:21:23.210119] E
> [syncdutils(/gluster/home-brick-1):276:log_raise_exception] <top>:
FAIL:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 306,
> in twrap
>     tf(*aa)
>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line
438, in
> keep_alive
>     cls.slave.server.keep_alive(vi)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
226, in
> __call__
>     return self.ins(self.meth, *a)
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
208, in
> __call__
>     raise res
> OSError: [Errno 22] Invalid argument
> [2015-08-24 15:21:23.210975] I
> [syncdutils(/gluster/home-brick-1):220:finalize] <top>: exiting.
> [2015-08-24 15:21:23.212455] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2015-08-24 15:21:23.212707] I [syncdutils(agent):220:finalize]
<top>:
> exiting.
> [2015-08-24 15:21:24.23336] I [monitor(monitor):282:monitor] Monitor:
> worker(/gluster/home-brick-1) died in startup phase
>
>
>
> and on the slave (in a different timezone, one hour behind):
>
>
>
> [2015-08-24 14:22:02.923098] I [resource(slave):844:service_loop] GLUSTER:
> slave listening
> [2015-08-24 14:22:07.606774] E [repce(slave):117:worker] <top>: call
> failed:
> Traceback (most recent call last):
>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line
113, in
> worker
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/libexec/glusterfs/python/syncdaemon/resource.py",
line 731,
> in entry_ops
>     [ESTALE, EINVAL])
>   File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 475,
> in errno_wrap
>     return call(*arg)
>   File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py",
line 78,
> in lsetxattr
>     cls.raise_oserr()
>   File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py",
line 37,
> in raise_oserr
>     raise OSError(errn, os.strerror(errn))
> OSError: [Errno 5] Input/output error
> [2015-08-24 14:22:07.652092] I [repce(slave):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2015-08-24 14:22:07.652364] I [syncdutils(slave):220:finalize]
<top>:
> exiting.
>
>
>
> --------------
> G-RESEARCH believes the information provided herein is reliable. While
> every care has been taken to ensure accuracy, the information is furnished
> to the recipients with no warranty as to the completeness and accuracy of
> its contents and on condition that any errors or omissions shall not be
> made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and do
> not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by
> G-RESEARCH and are subject to archival storage, monitoring, review and
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company number
> 08127121.
> --------------
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150826/0d5f050a/attachment.html>

Thibault Godouet

2015-Sep-30 15:21 UTC

head link

[Gluster-users] glusterfs geo-replication

Hi,

I am still getting the same issue, and looking further into this, I have
modified syncdutils.py slightly on the slave node to print more information
on the nature of this OSError:

[2015-09-30 12:19:25.8531] I [gsyncd(slave):649:main_i] <top>: syncing:
gluster://localhost:homegs
[2015-09-30 12:19:26.79925] I [resource(slave):844:service_loop] GLUSTER:
slave listening
[2015-09-30 12:19:30.723542] W [syncdutils(slave):481:errno_wrap] <top>:
about to raise exception for <built-in function
rename>(['.gfid/a780729e-3602-4c30-9966-f31fa5964843/CMakeCache.txt.tmp',
'.gfid/a780729e-3602-4c30-9966-f31fa5964843/CMakeCache.txt']) from
/...[Errno 16] Device or resource busy
[2015-09-30 12:19:30.723845] I [syncdutils(slave):483:errno_wrap] <top>:
we
could try deleting
'.glusterfs/a7/80/a780729e-3602-4c30-9966-f31fa5964843/CMakeCache.txt.tmp'
[2015-09-30 12:19:30.723974] E [repce(slave):117:worker] <top>: call
failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113,
in
worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
725, in
entry_ops
    [ENOENT, EEXIST], [ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
475,
in errno_wrap
    return call(*arg)
OSError: [Errno 16] Device or resource busy
[2015-09-30 12:19:30.770679] I [repce(slave):92:service_loop] RepceServer:
terminating on reaching EOF.
[2015-09-30 12:19:30.770944] I [syncdutils(slave):220:finalize] <top>:
exiting.

Below is a list of errors:

W [syncdutils(slave):481:errno_wrap] <top>: about to raise exception for
<built-in function
rename>(['.gfid/93b6282a-a47f-40fb-b83e-efec5b9c1a4e/.viminfo.tmp',
'.gfid/93b6282a-a47f-40fb-b83e-efec5b9c1a4e/.viminfo']) from /...[Errno
16]
Device or resource busy
W [syncdutils(slave):481:errno_wrap] <top>: about to raise exception for
<built-in function
rename>(['.gfid/9c4f77d0-0855-4607-8d65-1a52b3cdbbce/gr_rsync_homes_new.sh',
'.gfid/9c4f77d0-0855-4607-8d65-1a52b3cdbbce/gr_rsync_homes_new.sh~'])
from
/...[Errno 16] Device or resource busy
W [syncdutils(slave):481:errno_wrap] <top>: about to raise exception for
<built-in function
rename>(['.gfid/a780729e-3602-4c30-9966-f31fa5964843/CMakeCache.txt.tmp',
'.gfid/a780729e-3602-4c30-9966-f31fa5964843/CMakeCache.txt']) from
/...[Errno 16] Device or resource busy
W [syncdutils(slave):481:errno_wrap] <top>: about to raise exception for
<built-in function
rename>(['.gfid/a78e4abc-1de4-4f0d-a5e2-1244a0e3b708/depend.internal.tmp',
'.gfid/a78e4abc-1de4-4f0d-a5e2-1244a0e3b708/depend.internal']) from
/...[Errno 16] Device or resource busy
W [syncdutils(slave):481:errno_wrap] <top>: about to raise exception for
<built-in function
rename>(['.gfid/f5491794-32e2-46f4-8c50-7ca31e34111f/
gluster-volume-info.debug.pl',
'.gfid/f5491794-32e2-46f4-8c50-7ca31e34111f/
gluster-volume-info.debug.pl~'])...[Errno 16] Device or resource busy
W [syncdutils(slave):481:errno_wrap] <top>: about to raise exception for
<built-in function
rename>(['.gfid/f5491794-32e2-46f4-8c50-7ca31e34111f/
gluster-volume-info.debug.pl',
'.gfid/f5491794-32e2-46f4-8c50-7ca31e34111f/
gluster-volume-info.debug.pl~']) from /...[Errno 16] Device or resource busy

I am not entirely clear why rename() would return this ?Device or resource
busy?.  Also I?m not sure what are these ?.gfid? paths exactly: is this
something transient? Where does it live exactly?

I tried deleting the affected files (e.g.
a78e4abc-1de4-4f0d-a5e2-1244a0e3b708/depend.internal.tmp) in .glusterfs/,
and so far that allowed the geo-replication to resume? until it hits
another one of these.

In some cases I found that this gfid points to a directory that doesn?t
exist, e.g.:

ls-alh
/gluster/homegs-brick-1/brick/.glusterfs/a7/8e/a78e4abc-1de4-4f0d-a5e2-1244a0e3b708

lrwxrwxrwx 1 root root 71 Sep 30 11:26
/gluster/homegs-brick-1/brick/.glusterfs/a7/8e/a78e4abc-1de4-4f0d-a5e2-1244a0e3b708
-> ../../0d/be/0dbe0730-5ba9-45d5-91e5-aeb069ab08bf/watchdog_unit_test.dir

but the link is broken.

In that case deleting the
/gluster/homegs-brick-1/brick/.glusterfs/a7/8e/a78e4abc-1de4-4f0d-a5e2-1244a0e3b708
allowed the geo-replication to carry on.

Obviously it doesn?t work if I have to do this 50 times a day?

Any idea what could cause this and how to fix it properly?

Thx,
Thib.
On 26 Aug 2015 12:58 pm, "Thibault Godouet" <tibo92 at
godouet.net> wrote:
> Hi,
>
> Given that this is an 'OSError', I should probably have said we use
Redhat
> 6.6 64bit and XFS as the brick filesystem.
>
> Has anyone any input on how to troubleshoot this?
>
> Thanks,
> Thibault.
> On 24 Aug 2015 4:03 pm, "Thibault Godouet" <
> Thibault.Godouet at gresearch.co.uk> wrote:
>
>> I have had multiple issues with geo-replication.  It seems to work OK
>> initially, the replica gets up to date, and not long after (e.g. a
couple
>> of days), the replication goes into a faulty state and won't get
out of it.
>>
>>
>> I have tried a few times now, and last attempt I re-created the slave
>> volume and setup the replication again.  Same symptoms again.
>>
>>
>>
>> I use Gluster 3.7.3, and you will find my setup and log messages at the
>> bottom of the email.
>>
>>
>> Any idea what could cause this and how to fix it?
>>
>> Thanks,
>> Thibault.
>>
>> ps: my setup and log messages:
>>
>>
>>
>> Master:
>>
>>
>>
>> Volume Name: home
>> Type: Replicate
>> Volume ID: 2299a204-a1dc-449d-8556-bc65197373c7
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: server4.uberit.net:/gluster/home-brick-1
>> Brick2: server5.uberit.net:/gluster/home-brick-1
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> geo-replication.indexing: on
>> geo-replication.ignore-pid-check: on
>> changelog.changelog: on
>>
>>
>>
>> Slave:
>>
>> Volume Name: homegs
>> Type: Distribute
>> Volume ID: 746dfdc3-650d-4468-9fdd-d621dd215b94
>> Status: Started
>> Number of Bricks: 1
>> Transport-type: tcp
>> Bricks:
>> Brick1: remoteserver1.uberit.net:/gluster/homegs-brick-1/brick
>> Options Reconfigured:
>> performance.readdir-ahead: on
>>
>>
>>
>> The geo-replication status and config (I think I ended up with only
>> defaults values) are:
>>
>>
>>
>> # gluster volume geo-replication home ssh://remoteserver1::homegs
status
>>
>> MASTER NODE       MASTER VOL    MASTER BRICK             SLAVE USER
>> SLAVE                           SLAVE NODE    STATUS    CRAWL STATUS
>> LAST_SYNCED
>>
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>> server5          home          /gluster/home-brick-1    root
>> ssh://remoteserver1::homegs      N/A           Faulty    N/A           
N/A
>> server4          home          /gluster/home-brick-1    root
>> ssh://remoteserver1::homegs      N/A           Faulty    N/A           
N/A
>>
>>
>>
>> # gluster volume geo-replication home ssh://remoteserver1::homegs
config
>> special_sync_mode: partial
>> state_socket_unencoded:
>>
/var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.socket
>> gluster_log_file:
>>
/var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.gluster.log
>> ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no
>> -i /var/lib/glusterd/geo-replication/secret.pem
>> ignore_deletes: false
>> change_detector: changelog
>> gluster_command_dir: /usr/sbin/
>> georep_session_working_dir:
>> /var/lib/glusterd/geo-replication/home_remoteserver1_homegs/
>> state_file:
>>
/var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.status
>> remote_gsyncd: /nonexistent/gsyncd
>> session_owner: 2299a204-a1dc-449d-8556-bc65197373c7
>> changelog_log_file:
>>
/var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs-changes.log
>> socketdir: /var/run/gluster
>> working_dir:
>>
/var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs
>> state_detail_file:
>>
/var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs-detail.status
>> ssh_command_tar: ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/tar_ssh.pem
>> pid_file:
>>
/var/lib/glusterd/geo-replication/home_remoteserver_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.pid
>> log_file:
>>
/var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.log
>> gluster_params: aux-gfid-mount acl
>> volume_id: 2299a204-a1dc-449d-8556-bc65197373c7
>>
>>
>>
>> The logs look like on the master on server1:
>>
>>
>>
>> [2015-08-24 15:21:07.955600] I [monitor(monitor):221:monitor] Monitor:
>> ------------------------------------------------------------
>> [2015-08-24 15:21:07.955883] I [monitor(monitor):222:monitor] Monitor:
>> starting gsyncd worker
>> [2015-08-24 15:21:08.69528] I
[gsyncd(/gluster/home-brick-1):649:main_i]
>> <top>: syncing: gluster://localhost:home ->
>> ssh://root at vivlinuxinfra1.uberit.net:gluster://localhost:homegs
>> [2015-08-24 15:21:08.70938] I [changelogagent(agent):75:__init__]
>> ChangelogAgent: Agent listining...
>> [2015-08-24 15:21:11.255237] I
>> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting
up xsync
>> change detection mode
>> [2015-08-24 15:21:11.255532] I
>> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
>> sync engine
>> [2015-08-24 15:21:11.256570] I
>> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting
up
>> changelog change detection mode
>> [2015-08-24 15:21:11.256726] I
>> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
>> sync engine
>> [2015-08-24 15:21:11.257345] I
>> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting
up
>> changeloghistory change detection mode
>> [2015-08-24 15:21:11.257534] I
>> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
>> sync engine
>> [2015-08-24 15:21:13.333628] I
>> [master(/gluster/home-brick-1):1212:register] _GMaster: xsync temp
>> directory:
>>
/var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40172.18.0.169%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs/62d98e8cc00a34eb85b4fe6d6fd3ba33/xsync
>> [2015-08-24 15:21:13.333870] I
>> [resource(/gluster/home-brick-1):1432:service_loop] GLUSTER: Register
time:
>> 1440426073
>> [2015-08-24 15:21:13.401132] I
>> [master(/gluster/home-brick-1):523:crawlwrap] _GMaster: primary master
with
>> volume id 2299a204-a1dc-449d-8556-bc65197373c7 ...
>> [2015-08-24 15:21:13.412795] I
>> [master(/gluster/home-brick-1):532:crawlwrap] _GMaster: crawl interval:
1
>> seconds
>> [2015-08-24 15:21:13.427340] I
[master(/gluster/home-brick-1):1127:crawl]
>> _GMaster: starting history crawl... turns: 1, stime: (1440411353, 0)
>> [2015-08-24 15:21:14.432327] I
[master(/gluster/home-brick-1):1156:crawl]
>> _GMaster: slave's time: (1440411353, 0)
>> [2015-08-24 15:21:14.890889] E
>> [repce(/gluster/home-brick-1):207:__call__] RepceClient: call
>> 20960:140215190427392:1440426074.56 (entry_ops) failed on peer with
OSError
>> [2015-08-24 15:21:14.891124] E
>> [syncdutils(/gluster/home-brick-1):276:log_raise_exception]
<top>: FAIL:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py",
line 165, in
>> main
>>     main_i()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py",
line 659, in
>> main_i
>>     local.service_loop(*[r for r in [remote] if r])
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1438,
>> in service_loop
>>     g3.crawlwrap(oneshot=True)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 584, in
>> crawlwrap
>>     self.crawl()
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 1165,
>> in crawl
>>     self.changelogs_batch_process(changes)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 1074,
>> in changelogs_batch_process
>>     self.process(batch)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 952, in
>> process
>>     self.process_change(change, done, retry)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 907, in
>> process_change
>>     failures = self.slave.server.entry_ops(entries)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 226, in
>> __call__
>>     return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 208, in
>> __call__
>>     raise res
>> OSError: [Errno 5] Input/output error
>> [2015-08-24 15:21:14.892291] I
>> [syncdutils(/gluster/home-brick-1):220:finalize] <top>: exiting.
>> [2015-08-24 15:21:14.893665] I [repce(agent):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2015-08-24 15:21:14.893879] I [syncdutils(agent):220:finalize]
<top>:
>> exiting.
>> [2015-08-24 15:21:15.259360] I [monitor(monitor):282:monitor] Monitor:
>> worker(/gluster/home-brick-1) died in startup phase
>>
>>
>>
>> and on master server2:
>>
>>
>>
>> [2015-08-24 15:21:07.650707] I [monitor(monitor):221:monitor] Monitor:
>> ------------------------------------------------------------
>> [2015-08-24 15:21:07.651144] I [monitor(monitor):222:monitor] Monitor:
>> starting gsyncd worker
>> [2015-08-24 15:21:07.764817] I
[gsyncd(/gluster/home-brick-1):649:main_i]
>> <top>: syncing: gluster://localhost:home ->
>> ssh://root at remoteserver1.uberit.net:gluster://localhost:homegs
>> [2015-08-24 15:21:07.768552] I [changelogagent(agent):75:__init__]
>> ChangelogAgent: Agent listining...
>> [2015-08-24 15:21:11.9820] I
>> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting
up xsync
>> change detection mode
>> [2015-08-24 15:21:11.10199] I
>> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
>> sync engine
>> [2015-08-24 15:21:11.10946] I
>> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting
up
>> changelog change detection mode
>> [2015-08-24 15:21:11.11115] I
>> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
>> sync engine
>> [2015-08-24 15:21:11.11744] I
>> [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting
up
>> changeloghistory change detection mode
>> [2015-08-24 15:21:11.11933] I
>> [master(/gluster/home-brick-1):404:__init__] _GMaster: using
'rsync' as the
>> sync engine
>> [2015-08-24 15:21:13.59192] I
>> [master(/gluster/home-brick-1):1212:register] _GMaster: xsync temp
>> directory:
>>
/var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs/62d98e8cc00a34eb85b4fe6d6fd3ba33/xsync
>> [2015-08-24 15:21:13.59454] I
>> [resource(/gluster/home-brick-1):1432:service_loop] GLUSTER: Register
time:
>> 1440426073
>> [2015-08-24 15:21:13.113203] I
>> [master(/gluster/home-brick-1):523:crawlwrap] _GMaster: primary master
with
>> volume id 2299a204-a1dc-449d-8556-bc65197373c7 ...
>> [2015-08-24 15:21:13.122018] I
>> [master(/gluster/home-brick-1):532:crawlwrap] _GMaster: crawl interval:
1
>> seconds
>> [2015-08-24 15:21:23.209912] E
>> [repce(/gluster/home-brick-1):207:__call__] RepceClient: call
>> 1561:140164806457088:1440426083.11 (keep_alive) failed on peer with
OSError
>> [2015-08-24 15:21:23.210119] E
>> [syncdutils(/gluster/home-brick-1):276:log_raise_exception]
<top>: FAIL:
>> Traceback (most recent call last):
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 306, in twrap
>>     tf(*aa)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/master.py",
line 438, in
>> keep_alive
>>     cls.slave.server.keep_alive(vi)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 226, in
>> __call__
>>     return self.ins(self.meth, *a)
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 208, in
>> __call__
>>     raise res
>> OSError: [Errno 22] Invalid argument
>> [2015-08-24 15:21:23.210975] I
>> [syncdutils(/gluster/home-brick-1):220:finalize] <top>: exiting.
>> [2015-08-24 15:21:23.212455] I [repce(agent):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2015-08-24 15:21:23.212707] I [syncdutils(agent):220:finalize]
<top>:
>> exiting.
>> [2015-08-24 15:21:24.23336] I [monitor(monitor):282:monitor] Monitor:
>> worker(/gluster/home-brick-1) died in startup phase
>>
>>
>>
>> and on the slave (in a different timezone, one hour behind):
>>
>>
>>
>> [2015-08-24 14:22:02.923098] I [resource(slave):844:service_loop]
>> GLUSTER: slave listening
>> [2015-08-24 14:22:07.606774] E [repce(slave):117:worker] <top>:
call
>> failed:
>> Traceback (most recent call last):
>>   File "/usr/libexec/glusterfs/python/syncdaemon/repce.py",
line 113, in
>> worker
>>     res = getattr(self.obj, rmeth)(*in_data[2:])
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 731,
>> in entry_ops
>>     [ESTALE, EINVAL])
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
>> 475, in errno_wrap
>>     return call(*arg)
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78,
>> in lsetxattr
>>     cls.raise_oserr()
>>   File
"/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37,
>> in raise_oserr
>>     raise OSError(errn, os.strerror(errn))
>> OSError: [Errno 5] Input/output error
>> [2015-08-24 14:22:07.652092] I [repce(slave):92:service_loop]
>> RepceServer: terminating on reaching EOF.
>> [2015-08-24 14:22:07.652364] I [syncdutils(slave):220:finalize]
<top>:
>> exiting.
>>
>>
>>
>> --------------
>> G-RESEARCH believes the information provided herein is reliable. While
>> every care has been taken to ensure accuracy, the information is
furnished
>> to the recipients with no warranty as to the completeness and accuracy
of
>> its contents and on condition that any errors or omissions shall not be
>> made the basis of any claim, demand or cause of action.
>> The information in this email is intended only for the named recipient.
>> If you are not the intended recipient please notify us immediately and
do
>> not copy, distribute or take action based on this e-mail.
>> All messages sent to and from this e-mail address will be logged by
>> G-RESEARCH and are subject to archival storage, monitoring, review and
>> disclosure.
>> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>> Trenchant Limited is a company registered in England with company
number
>> 08127121.
>> --------------
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150930/4fe42c31/attachment.html>

Gluster users - Aug 2015 - glusterfs geo-replication

[Gluster-users] glusterfs geo-replication

[Gluster-users] glusterfs geo-replication