Kotte, Christian (Ext)
2018-Sep-24 15:10 UTC
[Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied
Yeah right. I get permission denied. [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: Permission denied [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/ ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/ ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/ ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29 total 0 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2 drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb [root at slave ~]# However, the strange thing is that I could replicate new files and folders before. The replication is broken since the ?New folder? was created. These are the permissions on a dev/test system: [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/ total 3136 drwx------. 44 root root 4096 Aug 22 18:19 00 drwx------. 50 root root 4096 Sep 12 13:14 01 drwx------. 54 root root 4096 Sep 13 11:33 02 drwx------. 59 root root 4096 Aug 22 18:21 03 drwx------. 60 root root 4096 Sep 12 13:14 04 drwx------. 68 root root 4096 Aug 24 12:36 05 drwx------. 56 root root 4096 Aug 22 18:21 06 drwx------. 46 root root 4096 Aug 22 18:21 07 drwx------. 51 root root 4096 Aug 22 18:21 08 drwx------. 42 root root 4096 Aug 22 18:21 09 drwx------. 44 root root 4096 Sep 13 11:16 0a I?ve configured an AD group, SGID bit, and ACLs via Ansible on the local mount point. Could this be an issue? Should I avoid configuring the permissions on .glusterfs and below? # ll /mnt/glustervol1/ total 12 drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software -rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp glusterfs_volumes: [?] permissions: mode: "02775" owner: root group: "AD+group" acl_permissions: rw [?] # root directory is owned by root. # set permissions to 'g+s' to automatically set the group to "AD+group" # permissions of individual files will be set by Samba during creation - name: Configure volume directory permission 1/2 tags: glusterfs file: path: /mnt/{{ item.volume }} state: directory mode: "{{ item.permissions.mode }}" owner: "{{ item.permissions.owner }}" group: "{{ item.permissions.group }}" with_items: "{{ glusterfs_volumes }}" loop_control: label: "{{ item.volume }}" when: item.permissions is defined # ACL needs to be set to override default umask and grant "AD+group" write permissions - name: Configure volume directory permission 2/2 (ACL) tags: glusterfs acl: path: /mnt/{{ item.volume }} default: yes entity: "{{ item.permissions.group }}" etype: group permissions: "{{ item.permissions.acl_permissions }}" state: present with_items: "{{ glusterfs_volumes }}" loop_control: label: "{{ item.volume }}" when: item.permissions is defined Regards, Christian From: Kotresh Hiremath Ravishankar <khiremat at redhat.com> Date: Monday, 24. September 2018 at 16:20 To: "Kotte, Christian (Ext)" <christian.kotte at novartis.com> Cc: Gluster Users <gluster-users at gluster.org> Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied I think I am get what's happening. The geo-rep session is non-root. Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e from a geaccount user and see if you are getting "Permission Denied" errors? Thanks, Kotresh HR On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <christian.kotte at novartis.com<mailto:christian.kotte at novartis.com>> wrote: Ok. It happens on all slave nodes (and on the interimmaster as well). It?s like I assumed. These are the logs of one of the slaves: gsyncd.log [2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf [2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1941 [2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening [2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' [2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path [ENOENT], [ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap return call(*arg) OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' [2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf [2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1854 [2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening The folder ?New folder? will be created via Samba and it was renamed by my colleague right away after creation. [root at slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ [root at slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al total 0 drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didn?t change anything. mnt-slave-bricks-brick1-brick.log [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0) [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] [2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'. [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0 [2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory / [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. Regards, Christian From: Kotresh Hiremath Ravishankar <khiremat at redhat.com<mailto:khiremat at redhat.com>> Date: Saturday, 22. September 2018 at 06:52 To: "Kotte, Christian (Ext)" <christian.kotte at novartis.com<mailto:christian.kotte at novartis.com>> Cc: Gluster Users <gluster-users at gluster.org<mailto:gluster-users at gluster.org>> Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug. On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <christian.kotte at novartis.com<mailto:christian.kotte at novartis.com>> wrote: Hi, Any idea how to troubleshoot this? New folders and files were created on the master and the replication went faulty. They were created via Samba. Version: GlusterFS 4.1.3 [root at master]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount at slave_1::glustervol1 N/A Faulty N/A N/A master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount at slave_2::glustervol1 N/A Faulty N/A N/A master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount at interimmaster::glustervol1 N/A Faulty N/A N/A The following error is repeatedly logged in the gsyncd.logs: [2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick [2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net<http://nrchbs-slp2020.nibr.novartis.net> [2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... [2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665 [2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146 [2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick [2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016 [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl [2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016 [2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0) [2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638 [2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 [2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0 [2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog [2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0) [2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError [2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__ raise res OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I?m not sure if the syncdaemon runs under geoaccount!? [root at master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager [root at master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ total 4 drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 [root at master vRealize Operation Manager]# It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn?t investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this error? Any help is appreciated. Regards, Christian Kotte _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=> -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180924/336a5072/attachment.html>
Kotte, Christian (Ext)
2018-Sep-24 15:19 UTC
[Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied
I don?t configure the permissions of /bricks/brick1/brick/.glusterfs. I don?t even see it on the local GlusterFS mount. Not sure why the permissions are configured with S and the AD group? Regards, Christian From: <gluster-users-bounces at gluster.org> on behalf of "Kotte, Christian (Ext)" <christian.kotte at novartis.com> Date: Monday, 24. September 2018 at 17:10 To: Kotresh Hiremath Ravishankar <khiremat at redhat.com> Cc: Gluster Users <gluster-users at gluster.org> Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied Yeah right. I get permission denied. [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: Permission denied [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/ ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission denied [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/ ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied [geoaccount at slave ~]$ ll /bricks/brick1/brick/.glusterfs/ ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission denied [root at slave ~]# ll /bricks/brick1/brick/.glusterfs/29 total 0 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2 drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6 drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb [root at slave ~]# However, the strange thing is that I could replicate new files and folders before. The replication is broken since the ?New folder? was created. These are the permissions on a dev/test system: [root at slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/ total 3136 drwx------. 44 root root 4096 Aug 22 18:19 00 drwx------. 50 root root 4096 Sep 12 13:14 01 drwx------. 54 root root 4096 Sep 13 11:33 02 drwx------. 59 root root 4096 Aug 22 18:21 03 drwx------. 60 root root 4096 Sep 12 13:14 04 drwx------. 68 root root 4096 Aug 24 12:36 05 drwx------. 56 root root 4096 Aug 22 18:21 06 drwx------. 46 root root 4096 Aug 22 18:21 07 drwx------. 51 root root 4096 Aug 22 18:21 08 drwx------. 42 root root 4096 Aug 22 18:21 09 drwx------. 44 root root 4096 Sep 13 11:16 0a I?ve configured an AD group, SGID bit, and ACLs via Ansible on the local mount point. Could this be an issue? Should I avoid configuring the permissions on .glusterfs and below? # ll /mnt/glustervol1/ total 12 drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software -rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp glusterfs_volumes: [?] permissions: mode: "02775" owner: root group: "AD+group" acl_permissions: rw [?] # root directory is owned by root. # set permissions to 'g+s' to automatically set the group to "AD+group" # permissions of individual files will be set by Samba during creation - name: Configure volume directory permission 1/2 tags: glusterfs file: path: /mnt/{{ item.volume }} state: directory mode: "{{ item.permissions.mode }}" owner: "{{ item.permissions.owner }}" group: "{{ item.permissions.group }}" with_items: "{{ glusterfs_volumes }}" loop_control: label: "{{ item.volume }}" when: item.permissions is defined # ACL needs to be set to override default umask and grant "AD+group" write permissions - name: Configure volume directory permission 2/2 (ACL) tags: glusterfs acl: path: /mnt/{{ item.volume }} default: yes entity: "{{ item.permissions.group }}" etype: group permissions: "{{ item.permissions.acl_permissions }}" state: present with_items: "{{ glusterfs_volumes }}" loop_control: label: "{{ item.volume }}" when: item.permissions is defined Regards, Christian From: Kotresh Hiremath Ravishankar <khiremat at redhat.com> Date: Monday, 24. September 2018 at 16:20 To: "Kotte, Christian (Ext)" <christian.kotte at novartis.com> Cc: Gluster Users <gluster-users at gluster.org> Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied I think I am get what's happening. The geo-rep session is non-root. Could you do readlink on brick path mentioned above /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e from a geaccount user and see if you are getting "Permission Denied" errors? Thanks, Kotresh HR On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) <christian.kotte at novartis.com<mailto:christian.kotte at novartis.com>> wrote: Ok. It happens on all slave nodes (and on the interimmaster as well). It?s like I assumed. These are the logs of one of the slaves: gsyncd.log [2018-09-24 13:52:25.418382] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-24 13:52:37.95297] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf [2018-09-24 13:52:37.109643] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-24 13:52:38.303920] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1941 [2018-09-24 13:52:38.304771] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening [2018-09-24 13:52:41.981554] I [resource(slave slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' [2018-09-24 13:52:42.45641] E [repce(slave slave/bricks/brick1/brick):105:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, in entry_ops src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 682, in get_slv_dir_path [ENOENT], [ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 540, in errno_wrap return call(*arg) OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' [2018-09-24 13:52:42.81794] I [repce(slave slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-24 13:52:53.459676] W [gsyncd(slave slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, using the default config path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf [2018-09-24 13:52:53.473500] I [resource(slave slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-24 13:52:54.659044] I [resource(slave slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1854 [2018-09-24 13:52:54.659837] I [resource(slave slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening The folder ?New folder? will be created via Samba and it was renamed by my colleague right away after creation. [root at slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ [root at slave glustervol1_slave_glustervol1]# ls /bricks/brick1/brick/.glusterfs/29/d1/ -al total 0 drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager Creating the folder in /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, but it didn?t change anything. mnt-slave-bricks-brick1-brick.log [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0) [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] [2018-09-24 13:51:10.643507] I [MSGID: 114006] [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed to set process-name in handshake msg [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustervol1-client-0: error returned while attempting to connect to host:(null), port:0 [2018-09-24 13:51:10.671460] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/bricks/brick1/brick'. [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0 [2018-09-24 13:51:10.673329] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory selfheal failed: Unable to form layout for directory / [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] 0-fuse: initating unmount of /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: received signum (15), shutting down [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. Regards, Christian From: Kotresh Hiremath Ravishankar <khiremat at redhat.com<mailto:khiremat at redhat.com>> Date: Saturday, 22. September 2018 at 06:52 To: "Kotte, Christian (Ext)" <christian.kotte at novartis.com<mailto:christian.kotte at novartis.com>> Cc: Gluster Users <gluster-users at gluster.org<mailto:gluster-users at gluster.org>> Subject: Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied The problem occured on slave side whose error is propagated to master. Mostly any traceback with repce involved is related to problem in slave. Just check few lines above in the log to find the slave node, the crashed worker is connected to and get geo replication logs to further debug. On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), <christian.kotte at novartis.com<mailto:christian.kotte at novartis.com>> wrote: Hi, Any idea how to troubleshoot this? New folders and files were created on the master and the replication went faulty. They were created via Samba. Version: GlusterFS 4.1.3 [root at master]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount at slave_1::glustervol1 N/A Faulty N/A N/A master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount at slave_2::glustervol1 N/A Faulty N/A N/A master glustervol1 /bricks/brick1/brick geoaccount ssh://geoaccount at interimmaster::glustervol1 N/A Faulty N/A N/A The following error is repeatedly logged in the gsyncd.logs: [2018-09-21 14:26:38.611479] I [repce(agent /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching EOF. [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: worker died in startup phase brick=/bricks/brick1/brick [2018-09-21 14:26:39.214322] I [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: starting gsyncd worker brick=/bricks/brick1/brick slave_node=nrchbs-slp2020.nibr.novartis.net<http://nrchbs-slp2020.nibr.novartis.net> [2018-09-21 14:26:49.471532] I [gsyncd(agent /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-21 14:26:49.473917] I [changelogagent(agent /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... [2018-09-21 14:26:49.491359] I [gsyncd(worker /bricks/brick1/brick):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-21 14:26:49.538049] I [resource(worker /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection between master and slave... [2018-09-21 14:26:53.5017] I [resource(worker /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between master and slave established. duration=3.4665 [2018-09-21 14:26:53.5419] I [resource(worker /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume locally... [2018-09-21 14:26:54.120374] I [resource(worker /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume duration=1.1146 [2018-09-21 14:26:54.121012] I [subcmds(worker /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. Acknowledging back to monitor [2018-09-21 14:26:56.144460] I [master(worker /bricks/brick1/brick):1593:register] _GMaster: Working dir path=/var/lib/misc/gluster/gsyncd/glustervol1_nrchbs-slp2020.nibr.novartis.net_glustervol1/bricks-brick1-brick [2018-09-21 14:26:56.145145] I [resource(worker /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time time=1537540016 [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change status=Active [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl Status Change status=History Crawl [2018-09-21 14:26:56.161536] I [master(worker /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) etime=1537540016 [2018-09-21 14:26:56.164277] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0) [2018-09-21 14:26:56.197065] I [master(worker /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed entry ops to_changelog=1537522638 num_changelogs=1 from_changelog=1537522638 [2018-09-21 14:26:56.197402] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 [2018-09-21 14:26:56.197623] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 XATT=0 [2018-09-21 14:26:56.198230] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1537522638 entry_stime=(1537537141, 0) changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 num_changelogs=1 mode=history_changelog [2018-09-21 14:26:57.200436] I [master(worker /bricks/brick1/brick):1536:crawl] _GMaster: slave's time stime=(1537522637, 0) [2018-09-21 14:26:57.528625] E [repce(worker /bricks/brick1/brick):197:__call__] RepceClient: call failed call=17209:140650361157440:1537540017.21 method=entry_ops error=OSError [2018-09-21 14:26:57.529371] E [syncdutils(worker /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in main func(args) File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in subcmd_worker local.service_loop(remote) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in __call__ raise res OSError: [Errno 13] Permission denied: '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' The permissions look fine. The replication is done via geo user instead of root. It should be able to read, but I?m not sure if the syncdaemon runs under geoaccount!? [root at master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e lrwxrwxrwx. 1 root root 75 Sep 21 09:39 /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager [root at master vRealize Operation Manager]# ll /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ total 4 drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 [root at master vRealize Operation Manager]# It could be possible that the folder was renamed. I had 3 similar issues since I migrated to GlusterFS 4.x but couldn?t investigate much. I needed to completely wipe GlusterFS and geo-repliction to get rid of this error? Any help is appreciated. Regards, Christian Kotte _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> https://lists.gluster.org/mailman/listinfo/gluster-users<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=> -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180924/14560076/attachment.html>