Morten Johansen
2014-Nov-07 20:23 UTC
[Gluster-users] Geo-replication fails on self.slave.server.set_stime() with OSError: [Errno 2] No such file or directory
Hi, list We?re having some issues with geo-replication, which I _think_ are related to delete operations. Sometimes the replication goes into faulty state, and then after a while comes back again. Changelog change detection fails, and it falls back to xsync. The slave volume does not replicate deleted files. My research led me to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1073844 The bug lists a traceback which is very similar to the one we?re seeing in our logs. We?re running version 3.5.2, which has this bug fix in it, and inspecting the master.py file on our actual servers confirms we do have this patch: http://review.gluster.org/#/c/7207/2/geo-replication/syncdaemon/master.py In our case, something fails in the call on the line BEFORE the patched one, i.e. the call to self.slave.server.set_stime() on line 152 in master.py This is an example traceback from our logs: <SNIP> [2014-11-07 12:47:07.516124] I [master(/media/slot2/geotest):1124:crawl] _GMaster: starting hybrid crawl... [2014-11-07 12:47:07.518146] I [master(/media/slot2/geotest):1133:crawl] _GMaster: processing xsync changelog /var/run/gluster/geotest/ssh%3A%2F%2Froot%4010.32.0.101%3Agluster%3A%2F%2F127.0.0.1%3Ageotest/d531d53915b53c130ad434b5295ebf7c/xsync/XSYNC-CHANGELOG.1415360827 [2014-11-07 12:47:07.520725] E [syncdutils(/media/slot2/geotest):240:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 542, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1177, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 467, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1137, in crawl self.upd_stime(item[1][1], item[1][0]) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 884, in upd_stime self.sendmark(path, stime) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 658, in sendmark self.set_slave_xtime(path, mark) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 152, in set_slave_xtime self.slave.server.set_stime(path, self.uuid, mark) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1163, in <lambda> slave.server.set_stime = types.MethodType(lambda _self, path, uuid, mark: brickserver.set_stime(path, uuid + '.' + gconf.slave_id, mark), slave.server) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 299, in ff return f(*a) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 496, in set_stime Xattr.lsetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'stime']), struct.pack('!II', *mark)) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 66, in lsetxattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 2] No such file or directory [2014-11-07 12:47:07.522511] I [syncdutils(/media/slot2/geotest):192:finalize] <top>: exiting. </SNIP> Any ideas on this one? What breaks if I comment out line 152 too? Any quick fixes on this would be much appreciated. Best regards, -- Morten Johansen Systems developer, Cerum AS