Morten Johansen
2014-Nov-07 20:23 UTC
[Gluster-users] Geo-replication fails on self.slave.server.set_stime() with OSError: [Errno 2] No such file or directory
Hi, list
We?re having some issues with geo-replication, which I _think_ are related to
delete operations.
Sometimes the replication goes into faulty state, and then after a while comes
back again.
Changelog change detection fails, and it falls back to xsync. The slave volume
does not replicate deleted files.
My research led me to this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1073844
The bug lists a traceback which is very similar to the one we?re seeing in our
logs.
We?re running version 3.5.2, which has this bug fix in it, and inspecting the
master.py file on our actual servers confirms we do have this patch:
http://review.gluster.org/#/c/7207/2/geo-replication/syncdaemon/master.py
In our case, something fails in the call on the line BEFORE the patched one,
i.e. the call to self.slave.server.set_stime() on line 152 in master.py
This is an example traceback from our logs:
<SNIP>
[2014-11-07 12:47:07.516124] I [master(/media/slot2/geotest):1124:crawl]
_GMaster: starting hybrid crawl...
[2014-11-07 12:47:07.518146] I [master(/media/slot2/geotest):1133:crawl]
_GMaster: processing xsync changelog
/var/run/gluster/geotest/ssh%3A%2F%2Froot%4010.32.0.101%3Agluster%3A%2F%2F127.0.0.1%3Ageotest/d531d53915b53c130ad434b5295ebf7c/xsync/XSYNC-CHANGELOG.1415360827
[2014-11-07 12:47:07.520725] E
[syncdutils(/media/slot2/geotest):240:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150,
in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 542,
in main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1177, in service_loop
g2.crawlwrap()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 467,
in crawlwrap
self.crawl()
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1137,
in crawl
self.upd_stime(item[1][1], item[1][0])
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 884,
in upd_stime
self.sendmark(path, stime)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 658,
in sendmark
self.set_slave_xtime(path, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 152,
in set_slave_xtime
self.slave.server.set_stime(path, self.uuid, mark)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line
1163, in <lambda>
slave.server.set_stime = types.MethodType(lambda _self, path, uuid, mark:
brickserver.set_stime(path, uuid + '.' + gconf.slave_id, mark),
slave.server)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 299,
in ff
return f(*a)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 496,
in set_stime
Xattr.lsetxattr(path, '.'.join([cls.GX_NSPACE, uuid,
'stime']), struct.pack('!II', *mark))
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 66,
in lsetxattr
cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25,
in raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 2] No such file or directory
[2014-11-07 12:47:07.522511] I [syncdutils(/media/slot2/geotest):192:finalize]
<top>: exiting.
</SNIP>
Any ideas on this one? What breaks if I comment out line 152 too?
Any quick fixes on this would be much appreciated.
Best regards,
--
Morten Johansen
Systems developer, Cerum AS