Frederik Ferner
2010-May-12 12:15 UTC
[Lustre-discuss] problem with too many (default) ACLs on a directory
Hi, we are having problems with ACLs at the moment. As far as we understand this is what has happened. We have a directory with 33 default ACLs on it in addition to 32 other ACLs. Our problem started when a user created a subdirectory in the directory with the 33 default ACLs. This worked but the new directory now is inaccessible. The number of ACLs on the initial directory does not seem to matter. I hope this will be clearer with a short example when I managed to reproduce it on our test file system: [bnh65367 at cs04r-sc-serv-06 testdir]$ getfacl . # file: . # owner: bnh65367 # group: bnh65367 user::rwx group::rwx group:dls_sysadmin:rwx mask::rwx other::r-x default:user::rwx default:group::rwx default:group:dls_sysadmin:rwx default:group:ltest1:r-- default:group:ltest2:r-- default:group:ltest3:r-- default:group:ltest4:r-- default:group:ltest5:r-- default:group:ltest6:r-- default:group:ltest7:r-- default:group:ltest8:r-- default:group:ltest9:r-- default:group:ltest10:r-- default:group:ltest11:r-- default:group:ltest12:r-- default:group:ltest13:r-- default:group:ltest14:r-- default:group:ltest15:r-- default:group:ltest16:r-- default:group:ltest17:r-- default:group:ltest18:r-- default:group:ltest19:r-- default:group:ltest20:r-- default:group:ltest21:r-- default:group:ltest22:r-- default:group:ltest23:r-- default:group:ltest24:r-- default:group:ltest25:r-- default:group:ltest26:r-- default:group:ltest27:r-- default:group:ltest28:r-x default:mask::rwx default:other::r-x [bnh65367 at cs04r-sc-serv-06 testdir]$ mkdir testdir3 [bnh65367 at cs04r-sc-serv-06 testdir]$ ls -ld testdir3 ls: testdir3: Numerical result out of range [bnh65367 at cs04r-sc-serv-06 testdir]$ ls -l total 8 drwxrwxr-x+ 2 bnh65367 bnh65367 4096 May 12 11:23 testdir ?--------- ? ? ? ? ? testdir2 ?--------- ? ? ? ? ? testdir3 -rw-rw-r--+ 1 bnh65367 bnh65367 0 May 12 11:24 testfile [bnh65367 at cs04r-sc-serv-06 testdir]$ stat testdir3 stat: cannot stat `testdir3'': Numerical result out of range [bnh65367 at cs04r-sc-serv-06 testdir]$ (The other entries in there have been created when the number of default ACLs was only 32.) We were also not able to create any files in that directory: [bnh65367 at cs04r-sc-serv-06 testdir]$ touch testfile3 touch: cannot touch `testfile3'': Numerical result out of range The following log entry on the MDS seems related, no error on the client that I could find. May 12 12:50:56 cs04r-sc-mds02-01 kernel: LustreError: 3329:0:(handler.c:732:mds_pack_posix_acl()) buflen 260, get acl: -34 May 12 12:50:56 cs04r-sc-mds02-01 kernel: LustreError: 3329:0:(handler.c:732:mds_pack_posix_acl()) Skipped 3 previous similar messages (-34 is -ERANGE) We found bug #17636 which seems related but not quite the same issue and is apparently fixed in the version we are using. In a test we were able to apply up to 32 ACLs to a file, the 33th ACL failed with the message "Operation not supported". Does anyone have any idea how we could get access to these directories back? Just removing some of the ACLs did not work as it seems setfacl stats the directory first or something: [bnh65367 at cs04r-sc-serv-06 testdir]$ setfacl -x group:ltest1: testdir3 setfacl: testdir3: Numerical result out of range [bnh65367 at cs04r-sc-serv-06 testdir]$ This is with Lustre 1.6.7.2.ddn3.5 on client and MDS, both are running RHEL5 if it makes a difference. Kind regards, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.)
Andreas Dilger
2010-May-13 08:54 UTC
[Lustre-discuss] problem with too many (default) ACLs on a directory
On 2010-05-12, at 06:15, Frederik Ferner wrote:> we are having problems with ACLs at the moment. As far as we understand > this is what has happened. > > We have a directory with 33 default ACLs on it in addition to 32 other ACLs. > > Our problem started when a user created a subdirectory in the directory > with the 33 default ACLs. This worked but the new directory now is > inaccessible. The number of ACLs on the initial directory does not seem > to matter.For a long time there was a kernel limit of 32 ACLs on a single file. Looking at newer kernel code it seems this limit is not longer present (it just tries to store the maximum xattr size possible to hold the ACL). I see in the Lustre code that we have some constants still related to this: # define LUSTRE_POSIX_ACL_MAX_ENTRIES (32) # define LUSTRE_POSIX_ACL_MAX_SIZE \ (mds_xattr_acl_size(LUSTRE_POSIX_ACL_MAX_ENTRIES)) It does seem that the MDS should prevent storing an xattr that is larger than this size, but it is possible that if you are building the ACL incrementally it misses this limit check. It may very well be that by creating a default ACL will bypass this limit and then when it is inherited by the new directory it breaks... The relevant code is: int mds_setxattr_internal(struct ptlrpc_request *req, struct mds_body *body) { /* currently lustre limit xattr size */ if (body->valid & OBD_MD_FLXATTR && !strcmp(xattr_name, XATTR_NAME_ACL_ACCESS)) { xattrlen = lustre_msg_buflen(req->rq_reqmsg, REQ_REC_OFF + 2); if (xattrlen > LUSTRE_POSIX_ACL_MAX_SIZE) GOTO(out, -ERANGE); } but it should also check if the xattr_name is XATTR_NAME_ACL_DEFAULT. if (body->valid & OBD_MD_FLXATTR && (!strcmp(xattr_name, XATTR_NAME_ACL_ACCESS) || !strcmp(xattr_name, XATTR_NAME_ACL_DEFAULT)) { Having a fixed maximum number of ACLs is important for Lustre, since the RDMA reply buffers have to be allocated before the client knows how many ACLs are stored on the file. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Frederik Ferner
2010-May-13 10:38 UTC
[Lustre-discuss] problem with too many (default) ACLs on a directory
Andreas Dilger wrote:> On 2010-05-12, at 06:15, Frederik Ferner wrote: >> we are having problems with ACLs at the moment. As far as we understand >> this is what has happened. >> >> We have a directory with 33 default ACLs on it in addition to 32 other ACLs. >> >> Our problem started when a user created a subdirectory in the directory >> with the 33 default ACLs. This worked but the new directory now is >> inaccessible. The number of ACLs on the initial directory does not seem >> to matter. > > For a long time there was a kernel limit of 32 ACLs on a single file. > Looking at newer kernel code it seems this limit is not longer > present (it just tries to store the maximum xattr size possible to > hold the ACL).You are correct, the limit is at least higher than 32 ACLs currently. In my tests on a RHEL5 system on an ext3 FS, I managed to create and read 100 ACLs with no problem.> > I see in the Lustre code that we have some constants still related to this: > > # define LUSTRE_POSIX_ACL_MAX_ENTRIES (32) > # define LUSTRE_POSIX_ACL_MAX_SIZE \ > (mds_xattr_acl_size(LUSTRE_POSIX_ACL_MAX_ENTRIES)) >> It does seem that the MDS should prevent storing an xattr that is > larger than this size, but it is possible that if you are building > the ACL incrementally it misses this limit check. It may very well > be that by creating a default ACL will bypass this limit and then > when it is inherited by the new directory it breaks...That seems to be what''s happening, yes. I can set 33 default ACLs on a directory which then breaks access to newly created files and directories in the directory. Note that I can still access all files when mounting a LVM snapshot of the MDT as ldiskfs. I may try to shutdown our test file system MDT later and reduce the number of ACLs on my inaccessible test files. Should I expect any problems from this?> The relevant code is: > > int mds_setxattr_internal(struct ptlrpc_request *req, struct mds_body *body) > { > /* currently lustre limit xattr size */ > if (body->valid & OBD_MD_FLXATTR && > !strcmp(xattr_name, XATTR_NAME_ACL_ACCESS)) { > xattrlen = lustre_msg_buflen(req->rq_reqmsg, > REQ_REC_OFF + 2); > > if (xattrlen > LUSTRE_POSIX_ACL_MAX_SIZE) > GOTO(out, -ERANGE); > } > > but it should also check if the xattr_name is XATTR_NAME_ACL_DEFAULT. > > if (body->valid & OBD_MD_FLXATTR && > (!strcmp(xattr_name, XATTR_NAME_ACL_ACCESS) || > !strcmp(xattr_name, XATTR_NAME_ACL_DEFAULT)) {Should we open a bug to track this?> Having a fixed maximum number of ACLs is important for Lustre, since > the RDMA reply buffers have to be allocated before the client knows > how many ACLs are stored on the file.Ah yes. Would it be possible to increase this number? I guess not easily as I can imaging it might break interoperability between clients which still use the old limit and a MDS with the new limit. Thanks, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.)
Andreas Dilger
2010-May-13 22:03 UTC
[Lustre-discuss] problem with too many (default) ACLs on a directory
On 2010-05-13, at 04:38, Frederik Ferner wrote:> Andreas Dilger wrote: >> On 2010-05-12, at 06:15, Frederik Ferner wrote: >>> we are having problems with ACLs at the moment. As far as we understand >>> this is what has happened. >>> >>> We have a directory with 33 default ACLs on it in addition to 32 other ACLs. >>> >>> Our problem started when a user created a subdirectory in the directory >>> with the 33 default ACLs. This worked but the new directory now is >>> inaccessible. The number of ACLs on the initial directory does not seem >>> to matter. >> >> For a long time there was a kernel limit of 32 ACLs on a single file. >> Looking at newer kernel code it seems this limit is not longer >> present (it just tries to store the maximum xattr size possible to >> hold the ACL). > > You are correct, the limit is at least higher than 32 ACLs currently. In my tests on a RHEL5 system on an ext3 FS, I managed to create and read 100 ACLs with no problem.Definitely worth filing a bug for this.>> I see in the Lustre code that we have some constants still related to this: >> # define LUSTRE_POSIX_ACL_MAX_ENTRIES (32) >> # define LUSTRE_POSIX_ACL_MAX_SIZE \ >> (mds_xattr_acl_size(LUSTRE_POSIX_ACL_MAX_ENTRIES)) > >> It does seem that the MDS should prevent storing an xattr that is >> larger than this size, but it is possible that if you are building >> the ACL incrementally it misses this limit check. It may very well >> be that by creating a default ACL will bypass this limit and then >> when it is inherited by the new directory it breaks... > > That seems to be what''s happening, yes. I can set 33 default ACLs on a directory which then breaks access to newly created files and directories in the directory. > > Note that I can still access all files when mounting a LVM snapshot of the MDT as ldiskfs. I may try to shutdown our test file system MDT later and reduce the number of ACLs on my inaccessible test files. Should I expect any problems from this?No, this should work OK.>> The relevant code is: >> int mds_setxattr_internal(struct ptlrpc_request *req, struct mds_body *body) >> { >> /* currently lustre limit xattr size */ >> if (body->valid & OBD_MD_FLXATTR && >> !strcmp(xattr_name, XATTR_NAME_ACL_ACCESS)) { >> xattrlen = lustre_msg_buflen(req->rq_reqmsg, >> REQ_REC_OFF + 2); >> if (xattrlen > LUSTRE_POSIX_ACL_MAX_SIZE) >> GOTO(out, -ERANGE); >> } >> but it should also check if the xattr_name is XATTR_NAME_ACL_DEFAULT. >> if (body->valid & OBD_MD_FLXATTR && >> (!strcmp(xattr_name, XATTR_NAME_ACL_ACCESS) || >> !strcmp(xattr_name, XATTR_NAME_ACL_DEFAULT)) { > > Should we open a bug to track this?Please do. If it includes this comment as a patch, it will likely make it into 1.8.4.>> Having a fixed maximum number of ACLs is important for Lustre, since >> the RDMA reply buffers have to be allocated before the client knows >> how many ACLs are stored on the file. > > Ah yes. Would it be possible to increase this number? I guess not easily as I can imaging it might break interoperability between clients which still use the old limit and a MDS with the new limit.Yes, it won''t be trivial. That said, if we change the client now to allow more ACLs (say, up to 128 or so), we can change the server at some later date to start sending more. It would be difficult to make it dynamic based on the reply size, because one client may allow the larger size, but it makes the file inaccessible from another client that doesn''t handle this larger size. I think we could only start using the larger size at some major version change where we know older clients will no longer be supported. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Frederik Ferner
2010-May-14 15:40 UTC
[Lustre-discuss] problem with too many (default) ACLs on a directory
Andreas Dilger wrote:> On 2010-05-13, at 04:38, Frederik Ferner wrote: >> Andreas Dilger wrote:>>> The relevant code is: >>> int mds_setxattr_internal(struct ptlrpc_request *req, struct mds_body *body) >>> { >>> /* currently lustre limit xattr size */ >>> if (body->valid & OBD_MD_FLXATTR && >>> !strcmp(xattr_name, XATTR_NAME_ACL_ACCESS)) { >>> xattrlen = lustre_msg_buflen(req->rq_reqmsg, >>> REQ_REC_OFF + 2); >>> if (xattrlen > LUSTRE_POSIX_ACL_MAX_SIZE) >>> GOTO(out, -ERANGE); >>> } >>> but it should also check if the xattr_name is XATTR_NAME_ACL_DEFAULT. >>> if (body->valid & OBD_MD_FLXATTR && >>> (!strcmp(xattr_name, XATTR_NAME_ACL_ACCESS) || >>> !strcmp(xattr_name, XATTR_NAME_ACL_DEFAULT)) { >> Should we open a bug to track this? > > Please do. If it includes this comment as a patch, it will likely make it into 1.8.4.Done: https://bugzilla.lustre.org/show_bug.cgi?id=22820 Thanks, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.)