Strahil Nikolov
2020-Feb-10 15:28 UTC
[Gluster-users] Permission denied at some directories/files after a split brain
On February 10, 2020 3:53:08 PM GMT+02:00, Alberto Bengoa <bengoa at gmail.com> wrote:>Hello guys, > >We are running GlusterFS 6.6 in Replicate mode (1 x 3). After a >split-brain >and a massive heal process, we noticed that our app started to receive >thousands of permissions denied while trying to access files and >directories. > >Exemple log of a failed access atempt to a specific directory: > >[2020-02-10 10:38:17.402080] I [MSGID: 139001] >[posix-acl.c:263:posix_acl_log_permit_denied] >0-app_data-access-control: >client: >CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, >gfid: 092f1e28-d6a8-4ca9-95d5-75dc8ad1c835, >req(uid:498,gid:498,perm:4,ngrps:1), >ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) >[Permission denied] >[2020-02-10 10:38:17.402182] E [MSGID: 115056] >[server-rpc-fops_v2.c:687:server4_opendir_cbk] 0-app_data-server: >6257941: >OPENDIR /mailboxes.old/8692/211411002/Old >(092f1e28-d6a8-4ca9-95d5-75dc8ad1c835), client: >CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, >error-xlator: app_data-access-control [Permission denied] > >The permission denied happens only to unprivileged users, even if that >unprivileged user is the directory owner. The root user is able to >access >all files, and if we "touch" the file/directory as root it *sometimes* >fixes the problem. > >We noticed inconsistent Access/Change dates. Here a stat of a directory >before touching it, showing these inconsistencies: > > File: ?Old? > Size: 4096 Blocks: 8 IO Block: 131072 directory >Device: 27h/39d Inode: 10388898073370567318 Links: 2 >Access: (2775/drwxrwsr-x) Uid: ( 498/app) Gid: ( 498/app) >Access: 1970-01-01 01:00:00.000000000 +0100 >Modify: 2020-02-07 13:21:10.365297527 +0000 >Change: 1970-01-01 01:00:00.000000000 +0100 > Birth: - > >I think this case is similar to the reported here[1] and discussed at >thread "ACL issue v6.6, v6.7, v7.1, v7.2", despite the fact that we are >not >using libvirt. We do use ACLs, but not in this particular directory. > >Any thoughts on this? > >[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1797099 > >Thanks, >Alberto BengoaHi Alberto, Sadly you should verify if the issue is the same. Enable the trace logs for the bricks and verify if the errors in the logs with those in the bugzilla. Don't forget to stop the trace log or your logs' dir will get full. What version of gluster are you using ? In my case only a downgrade has restored the operation of the cluster, so you should consider that as an option (last, but still an option). You can try to run a find against the fuse and 'find /path/to/fuse -exec setfacl -m u:root:rw {} \;' Maybe that will force gluster to read the ACLs again. Good luck! If you have the option, join the next gluster meeting and ask for an update (if the issue is actually the same). Best Regards, Strahil Nikolov
Alberto Bengoa
2020-Feb-11 10:01 UTC
[Gluster-users] Permission denied at some directories/files after a split brain
Hi Strahil, On Mon, 10 Feb 2020 at 15:28, Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> Hi Alberto, > Sadly you should verify if the issue is the same. > Enable the trace logs for the bricks and verify if the errors in the logs > with those in the bugzilla.Don't forget to stop the trace log or your logs' dir will get full.>Yes, the log is quite similar: [2020-02-10 10:38:17.402080] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-app_data-access-control: client: CTX_ID:7d744c50-43a1-4f81- 9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local. domain-PC_NAME:app_data-client-1-RECON_NO:-1, gfid: 092f1e28-d6a8-4ca9-95d5-75dc8ad1c835, req(uid:498,gid:498,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] [2020-02-10 10:38:17.402182] E [MSGID: 115056] [server-rpc-fops_v2.c:687:server4_opendir_cbk] 0-app_data-server: 6257941: OPENDIR /mailboxes.old/8692/211411002/Old (092f1e28-d6a8-4ca9-95d5-75dc8ad1c835), client: CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, error-xlator: app_data-access-control [Permission denied]> What version of gluster are you using ? >Gluster 6.6.> In my case only a downgrade has restored the operation of the cluster, so > you should consider that as an option (last, but still an option). > >We created a copy of the faulty's directory and put it in place of the older one to solve our issues for now. We kept the old one for further investigation.> You can try to run a find against the fuse and 'find /path/to/fuse -exec > setfacl -m u:root:rw {} \;' > Maybe that will force gluster to read the ACLs again. >Running a setfacl doesn't make any difference. If we do a chmod it "fixes" the permission problem.> > Good luck! > If you have the option, join the next gluster meeting and ask for an > update (if the issue is actually the same). > > Best Regards, > Strahil Nikolov >Thank you, Alberto Bengoa -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200211/5c1915ea/attachment.html>