Alberto Bengoa
2020-Feb-10 13:53 UTC
[Gluster-users] Permission denied at some directories/files after a split brain
Hello guys, We are running GlusterFS 6.6 in Replicate mode (1 x 3). After a split-brain and a massive heal process, we noticed that our app started to receive thousands of permissions denied while trying to access files and directories. Exemple log of a failed access atempt to a specific directory: [2020-02-10 10:38:17.402080] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-app_data-access-control: client: CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, gfid: 092f1e28-d6a8-4ca9-95d5-75dc8ad1c835, req(uid:498,gid:498,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied] [2020-02-10 10:38:17.402182] E [MSGID: 115056] [server-rpc-fops_v2.c:687:server4_opendir_cbk] 0-app_data-server: 6257941: OPENDIR /mailboxes.old/8692/211411002/Old (092f1e28-d6a8-4ca9-95d5-75dc8ad1c835), client: CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, error-xlator: app_data-access-control [Permission denied] The permission denied happens only to unprivileged users, even if that unprivileged user is the directory owner. The root user is able to access all files, and if we "touch" the file/directory as root it *sometimes* fixes the problem. We noticed inconsistent Access/Change dates. Here a stat of a directory before touching it, showing these inconsistencies: File: ?Old? Size: 4096 Blocks: 8 IO Block: 131072 directory Device: 27h/39d Inode: 10388898073370567318 Links: 2 Access: (2775/drwxrwsr-x) Uid: ( 498/app) Gid: ( 498/app) Access: 1970-01-01 01:00:00.000000000 +0100 Modify: 2020-02-07 13:21:10.365297527 +0000 Change: 1970-01-01 01:00:00.000000000 +0100 Birth: - I think this case is similar to the reported here[1] and discussed at thread "ACL issue v6.6, v6.7, v7.1, v7.2", despite the fact that we are not using libvirt. We do use ACLs, but not in this particular directory. Any thoughts on this? [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1797099 Thanks, Alberto Bengoa -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200210/d4723655/attachment.html>
Strahil Nikolov
2020-Feb-10 15:28 UTC
[Gluster-users] Permission denied at some directories/files after a split brain
On February 10, 2020 3:53:08 PM GMT+02:00, Alberto Bengoa <bengoa at gmail.com> wrote:>Hello guys, > >We are running GlusterFS 6.6 in Replicate mode (1 x 3). After a >split-brain >and a massive heal process, we noticed that our app started to receive >thousands of permissions denied while trying to access files and >directories. > >Exemple log of a failed access atempt to a specific directory: > >[2020-02-10 10:38:17.402080] I [MSGID: 139001] >[posix-acl.c:263:posix_acl_log_permit_denied] >0-app_data-access-control: >client: >CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, >gfid: 092f1e28-d6a8-4ca9-95d5-75dc8ad1c835, >req(uid:498,gid:498,perm:4,ngrps:1), >ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) >[Permission denied] >[2020-02-10 10:38:17.402182] E [MSGID: 115056] >[server-rpc-fops_v2.c:687:server4_opendir_cbk] 0-app_data-server: >6257941: >OPENDIR /mailboxes.old/8692/211411002/Old >(092f1e28-d6a8-4ca9-95d5-75dc8ad1c835), client: >CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1, >error-xlator: app_data-access-control [Permission denied] > >The permission denied happens only to unprivileged users, even if that >unprivileged user is the directory owner. The root user is able to >access >all files, and if we "touch" the file/directory as root it *sometimes* >fixes the problem. > >We noticed inconsistent Access/Change dates. Here a stat of a directory >before touching it, showing these inconsistencies: > > File: ?Old? > Size: 4096 Blocks: 8 IO Block: 131072 directory >Device: 27h/39d Inode: 10388898073370567318 Links: 2 >Access: (2775/drwxrwsr-x) Uid: ( 498/app) Gid: ( 498/app) >Access: 1970-01-01 01:00:00.000000000 +0100 >Modify: 2020-02-07 13:21:10.365297527 +0000 >Change: 1970-01-01 01:00:00.000000000 +0100 > Birth: - > >I think this case is similar to the reported here[1] and discussed at >thread "ACL issue v6.6, v6.7, v7.1, v7.2", despite the fact that we are >not >using libvirt. We do use ACLs, but not in this particular directory. > >Any thoughts on this? > >[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1797099 > >Thanks, >Alberto BengoaHi Alberto, Sadly you should verify if the issue is the same. Enable the trace logs for the bricks and verify if the errors in the logs with those in the bugzilla. Don't forget to stop the trace log or your logs' dir will get full. What version of gluster are you using ? In my case only a downgrade has restored the operation of the cluster, so you should consider that as an option (last, but still an option). You can try to run a find against the fuse and 'find /path/to/fuse -exec setfacl -m u:root:rw {} \;' Maybe that will force gluster to read the ACLs again. Good luck! If you have the option, join the next gluster meeting and ask for an update (if the issue is actually the same). Best Regards, Strahil Nikolov