Davide Obbi
2019-Mar-17 19:22 UTC
[Gluster-users] geo-replication - OSError: [Errno 1] Operation not permitted - failing with socket files?
Hi, i am trying to understand why georeplciation during "History Crawl" starts failing on each of the three bricks, one after the other. I have enabled DEBUG for all the logs configurable by the geo-replication command. Running glusterfs v4.16 the behaviour is as follow: - The "History Crawl" worked fine for about one hr, it actually replicated some files and folders albeit most of them looks empty - at some point it starts becoming faulty, try to start on another brick, faulty and so on - in the logs, Python exception above mentioned is raised: [2019-03-17 18:52:49.565040] E [syncdutils(worker /var/lib/heketi/mounts/vg_b088aec908c959c75674e01fb8598c21/brick_f90f425ecb89c3eec6ef2ef4a2f0a973/brick):332:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1291, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1569, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1469, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1304, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1203, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__ raise res OSError: [Errno 1] Operation not permitted - The operation before the exception: [2019-03-17 18:52:49.545103] D [master(worker /var/lib/heketi/mounts/vg_b088aec908c959c75674e01fb8598c21/brick_f90f425ecb89c3eec6ef2ef4a2f0a973/brick):1186:process_change] _GMaster: entries: [{'uid': 7575, 'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'gid': 100, 'mode' : 49536, 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.control_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op': 'MKNOD'}, {'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.control_f7c33270dc9db9 234d005406a13deb4375459715', 'stat': {'atime': 1552661403.3846507, 'gid': 100, 'mtime': 1552661403.3846507, 'uid': 7575, 'mode': 49536}, 'link': None, 'op': 'LINK'}, {'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.con trol_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op': 'UNLINK'}] [2019-03-17 18:52:49.548614] D [repce(worker /var/lib/heketi/mounts/vg_b088aec908c959c75674e01fb8598c21/brick_f90f425ecb89c3eec6ef2ef4a2f0a973/brick):179:push] RepceClient: call 56917:140179359156032:1552848769.55 entry_ops([{'uid': 7575, 'gfid': 'e1ad7c98-f32a-4e48-9902- cc75840de7c3', 'gid': 100, 'mode': 49536, 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b129abe671/.control_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op': 'MKNOD'}, {'gfid': '*e1ad7c98-f32a-4e48-9902-cc75840de7c3*', 'entry': '.gfid/5219e4b8-a1f3-4a4e-b9c7-c9b 129abe671/.control_f7c33270dc9db9234d005406a13deb4375459715', 'stat': {'atime': 1552661403.3846507, 'gid': 100, 'mtime': 1552661403.3846507, 'uid': 7575, 'mode': 49536}, 'link': None, 'op': 'LINK'}, {'gfid': 'e1ad7c98-f32a-4e48-9902-cc75840de7c3', 'entry': '.gfid/5219e4b8 -a1f3-4a4e-b9c7-c9b129abe671/*.control_f7c33270dc9db9234d005406a13deb4375459715.6lvofzOuVnfAwOwY', 'op'*: 'UNLINK'}],) ... - The gfid highlighted, is pointing to these control files which are "unix sockets" as per below: rw------- 2 pippo users 0 Mar 14 16:32 .control_31c3a99664c1f956f949311e58434037e6a52d22 srw------- 2 pippo users 0 Mar 14 16:33 .control_a9b82937042529bca677b9f43eba9eb02ca7c5ee srw------- 2 pippo users 0 Mar 14 16:32 .control_f429221460d52570066d9f25521011fe7e081cf5 srw------- 2 pippo users 0 Mar 15 15:50 .control_f7c33270dc9db9234d005406a13deb4375459715 So it seems geo-replicaiton should be at least skipping such file rather than raising an exception? Am i the first experiencing this behaviour? thanks in advance Davide -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190317/b5bbe65c/attachment.html>