Hendelman, Rob
2008-Dec-18 20:02 UTC
[Lustre-discuss] Lustre error: ll_inode_revalidate_fini failure -43
I have seen this error on several clients: LustreError: 6200:0:(file.c:2513:ll_inode_revalidate_fini()) failure -43 inode 36134281 Anyone know what this is in reference to? I see this referenced by multiple clients & always the same inode. I don''t see this problem in the logs with any other inode Googling (http://www.mail-archive.com/lustre-discuss at lists.lustre.org/msg00201.html) shows a post by Andreas Dilger on Feb 2008 saying that the mds has a user database that is missing a particular UID. How can I find the actual file with this inode on the filesystem to see what uid number is assigned to it? The mgs/mds/oss/clients all should have the same UID info using NIS. Thanks, Robert P.S. All our servers are x86_64 centos with the actual lustre rpms installed (1.6.5.1 IIRC) and the servers are 2.6.22 patchless ubuntu clients. The information contained in this message and its attachments is intended only for the private and confidential use of the intended recipient(s). If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e- mail is strictly prohibited.
Brian J. Murrell
2008-Dec-18 21:13 UTC
[Lustre-discuss] Lustre error: ll_inode_revalidate_fini failure -43
On Thu, 2008-12-18 at 14:02 -0600, Hendelman, Rob wrote:> I have seen this error on several clients: > > LustreError: 6200:0:(file.c:2513:ll_inode_revalidate_fini()) failure -43 inode 36134281-EIDRM> How can I find the actual file with this inode on the filesystem to see what uid number is assigned to it? The mgs/mds/oss/clients all should have the same UID info using NIS.Have you read the ops manual with regard to l_getgroups? If not, please do that and see if you have any more questions. Cheers, b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081218/a1b9b2fd/attachment.bin
Hendelman, Rob
2008-Dec-18 22:11 UTC
[Lustre-discuss] Lustre error: ll_inode_revalidate_fini failure -43
>> I have seen this error on several clients: >> >> LustreError: 6200:0:(file.c:2513:ll_inode_revalidate_fini()) failure -43 inode 36134281 > >-EIDRM >Not sure what EIDRM is exactly, but I''m guessing it refers to your message here: http://lists.lustre.org/pipermail/lustre-discuss/2008-February/006593.html>> How can I find the actual file with this inode on the filesystem to see what uid number is assigned to it? The mgs/mds/oss/clients all should have the same UID info using NIS. > >Have you read the ops manual with regard to l_getgroups? If not, please >do that and see if you have any more questions.I would have never really guessed to search for this since the error message talks about an inode. I was searching for inode information in the lustre manual. I wanted to at least identify the file attached to the inode that is having the problem (invalid/unknown uid/gid). from 32.5.9: My /proc/fs/lustre/mds/{mdtname}/group_upcall points to /usr/sbin/l_getgroups My /proc/fs/lustre/mds/{mdtname}/group_info is empty grepping /var/log/* for l_getgroups gets me a bunch of messages "no such user 113". Running find on the client against the lustre mountpoint (on the client) to find files owned by user 113 doesn''t return anything. Doing a test find (not using lfs find, but the local find) seems to find files by a UID I specify. uid 113 on the client is "nagios". Nobody should be logging in as nagios since nagios is only used to run the nrpe daemon. gid 113 on the client is smmta. The only reason I can speculate this happening is that our nagios box is talking to the lustre client to check free space on the mountpoint and when it tries to access the mountpoint it gets the error mentioned in the above thread. Actually, I just temporarily gave nagios user a shell and su''d to nagios. After I try to do an ls in /path/to/lustremntpoint I get the "identifier removed" error as shown in the Feb 2008 thread. I''m guessing the correct solution is to add a local nagios user with uid/gid on the mds with the 113 uid. Do I also need to do this for the mgs (in my case the same box, but it would be good to know for the future) and the oss''s ? Thanks for shedding some light on this. The key was the l_getgroup you mentioned. After that it seems a lot of things clicked into place. Best regards, Robert The information contained in this message and its attachments is intended only for the private and confidential use of the intended recipient(s). If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e- mail is strictly prohibited.
Brian J. Murrell
2008-Dec-18 23:11 UTC
[Lustre-discuss] Lustre error: ll_inode_revalidate_fini failure -43
On Thu, 2008-12-18 at 16:11 -0600, Hendelman, Rob wrote:> I would have never really guessed to search for this since the error message talks about an inode. I was searching for inode information in the lustre manual. I wanted to at least identify the file attached to the inode that is having the problem (invalid/unknown uid/gid).Yeah, it''s not obvious where to go given the error message. This really boils down to good/complete initial configuration that meets your requirements.> from 32.5.9: > My /proc/fs/lustre/mds/{mdtname}/group_upcall points to /usr/sbin/l_getgroupsWhen that proc variable is set as such (which might just be it''s default value) you are asserting that you want supplemental groups to work. If you don''t care about supplemental groups, the easiest thing to do is set that value to NONE. If you want supplemental groups to work then the passwd/group databases on the MDS(es) and clients MUST be the same. Not just for your usual login users, but for any (i.e. system) user that might be making any kind of query of Lustre, as you have discovered.> I''m guessing the correct solution is to add a local nagios user with uid/gid on the mds with the 113 uid. Do I also need to do this for the mgs (in my case the same box, but it would be good to know for the future) and the oss''s ?Just the MDS(es) and clients need a common database. The OSSes are not involved in this aspect.> Thanks for shedding some light on this.NP. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081218/f45df024/attachment.bin