I got this error today when testing a newly set up 1.6 filesystem: n50 1% cd /mnt/test n50 2% ls ls: reading directory .: Identifier removed n50 3% ls -alrt total 8 ?--------- ? ? ? ? ? dir1 ?--------- ? ? ? ? ? dir2 drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ n50 4% stat . File: `.'' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: ( 500/ nsc) Access: 2008-02-11 16:11:48.336621154 +0100 Modify: 2008-02-11 15:11:27.000000000 +0100 Change: 2008-02-11 15:11:31.352841294 +0100 this seems to be happen almost all the time when I am running as a specific user on this system. Note that the stat call always works... I haven''t yet been able to reproduce this problem when running as my own user. dmesg from client: LustreError: 9000:0:(dir.c:406:ll_readdir()) error reading dir 27583921/2381382571 page 0: rc -43 LustreError: 9019:0:(dir.c:406:ll_readdir()) error reading dir 27583921/2381382571 page 0: rc -43 LustreError: 9020:0:(dir.c:406:ll_readdir()) error reading dir 27583921/2381382571 page 0: rc -43 LustreError: 9021:0:(dir.c:406:ll_readdir()) error reading dir 27583921/2381382571 page 0: rc -43 LustreError: 9022:0:(dir.c:406:ll_readdir()) error reading dir 4848481/4054352687 page 0: rc -43 LustreError: 9127:0:(file.c:2413:ll_inode_revalidate_fini()) failure -43 inode 27616681 LustreError: 9128:0:(file.c:2413:ll_inode_revalidate_fini()) failure -43 inode 27616681 LustreError: 9129:0:(file.c:2413:ll_inode_revalidate_fini()) failure -43 inode 27616681 ... where error 43 means: Identifier removed. No error messages from the MDS or OSS:s. setup: Client: 2.6.9-55.0.9.EL_lustre.1.6.3smp (rhel4) 1 x MDS: 2.6.18-8.1.14.el5_lustre.1.6.4.2smp (rhel5) 4 x OSS with 2 OST:s each: 2.6.18-8.1.14.el5_lustre.1.6.4.2smp (rhel5) thanks, Per Lundqvist -- Per Lundqvist National Supercomputer Centre Link?ping University, Sweden http://www.nsc.liu.se
On Feb 11, 2008 17:04 +0100, Per Lundqvist wrote:> I got this error today when testing a newly set up 1.6 filesystem: > > n50 1% cd /mnt/test > n50 2% ls > ls: reading directory .: Identifier removed > > n50 3% ls -alrt > total 8 > ?--------- ? ? ? ? ? dir1 > ?--------- ? ? ? ? ? dir2 > drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ > drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ > > n50 4% stat . > File: `.'' > Size: 4096 Blocks: 8 IO Block: 4096 directory > Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 > Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: ( 500/ nsc) > Access: 2008-02-11 16:11:48.336621154 +0100 > Modify: 2008-02-11 15:11:27.000000000 +0100 > Change: 2008-02-11 15:11:31.352841294 +0100 > > this seems to be happen almost all the time when I am running as a > specific user on this system. Note that the stat call always works... I > haven''t yet been able to reproduce this problem when running as my own > user.EIDRM (Identifier removed) means that your MDS has a user database (/etc/passwd and /etc/group) that is missing the particular user ID. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Is this an error one would see on orphaned files with stat, ls -l, etc? Klaus ----- Original Message ----- From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org> To: Per Lundqvist <perl at nsc.liu.se> Cc: Lustre Discuss <lustre-discuss at lists.lustre.org> Sent: Mon Feb 11 13:11:45 2008 Subject: Re: [Lustre-discuss] rc -43: Identifier removed On Feb 11, 2008 17:04 +0100, Per Lundqvist wrote:> I got this error today when testing a newly set up 1.6 filesystem: > > n50 1% cd /mnt/test > n50 2% ls > ls: reading directory .: Identifier removed > > n50 3% ls -alrt > total 8 > ?--------- ? ? ? ? ? dir1 > ?--------- ? ? ? ? ? dir2 > drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ > drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ > > n50 4% stat . > File: `.'' > Size: 4096 Blocks: 8 IO Block: 4096 directory > Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 > Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: ( 500/ nsc) > Access: 2008-02-11 16:11:48.336621154 +0100 > Modify: 2008-02-11 15:11:27.000000000 +0100 > Change: 2008-02-11 15:11:31.352841294 +0100 > > this seems to be happen almost all the time when I am running as a > specific user on this system. Note that the stat call always works... I > haven''t yet been able to reproduce this problem when running as my own > user.EIDRM (Identifier removed) means that your MDS has a user database (/etc/passwd and /etc/group) that is missing the particular user ID. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
I had the same issue with my lustre setup. I think this should fix it -- tunefs.lustre --param mdt.group_upcall=NONE /dev/mdt/device On Feb 11, 2008, at 7:18 PM, Steden Klaus wrote:> > Is this an error one would see on orphaned files with stat, ls -l, > etc? > > Klaus > > > ----- Original Message ----- > From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org > > > To: Per Lundqvist <perl at nsc.liu.se> > Cc: Lustre Discuss <lustre-discuss at lists.lustre.org> > Sent: Mon Feb 11 13:11:45 2008 > Subject: Re: [Lustre-discuss] rc -43: Identifier removed > > On Feb 11, 2008 17:04 +0100, Per Lundqvist wrote: >> I got this error today when testing a newly set up 1.6 filesystem: >> >> n50 1% cd /mnt/test >> n50 2% ls >> ls: reading directory .: Identifier removed >> >> n50 3% ls -alrt >> total 8 >> ?--------- ? ? ? ? ? dir1 >> ?--------- ? ? ? ? ? dir2 >> drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ >> drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ >> >> n50 4% stat . >> File: `.'' >> Size: 4096 Blocks: 8 IO Block: 4096 >> directory >> Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 >> Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: >> ( 500/ nsc) >> Access: 2008-02-11 16:11:48.336621154 +0100 >> Modify: 2008-02-11 15:11:27.000000000 +0100 >> Change: 2008-02-11 15:11:31.352841294 +0100 >> >> this seems to be happen almost all the time when I am running as a >> specific user on this system. Note that the stat call always >> works... I >> haven''t yet been able to reproduce this problem when running as my >> own >> user. > > EIDRM (Identifier removed) means that your MDS has a user database > (/etc/passwd and /etc/group) that is missing the particular user ID. > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
On Mon, 11 Feb 2008, Aaron Knister wrote:> I had the same issue with my lustre setup. I think this should fix it -- > > tunefs.lustre --param mdt.group_upcall=NONE /dev/mdt/deviceThanks Andreas and Aaron, but then I wonder why the MDS needs to have all the users in its own passwd/group file? And what are the implications of setting the above mdt.group_upcall=NONE on the MDT? /Per -- Per Lundqvist National Supercomputer Centre Link?ping University, Sweden http://www.nsc.liu.se
What the group upcall does is get all the secondary groups for the client user. There isn''t enough room in the LNET message to send them all, so the MDS has to look it up in the /etc/groups. If you don''t care about secondary groups at all, there is no harm in clearing the group_upcall param. In theory, there also shouldn''t be any harm in having different passwd and group files on the MDS and OSSes than on the clients. It''s highly important, however, that all the clients have the same passwd and groups files. Otherwise the clients could interpret the same UID as different users, and people could go mucking around in each others files. - Kit Per Lundqvist wrote:> On Mon, 11 Feb 2008, Aaron Knister wrote: > > >> I had the same issue with my lustre setup. I think this should fix it -- >> >> tunefs.lustre --param mdt.group_upcall=NONE /dev/mdt/device >> > > Thanks Andreas and Aaron, but then I wonder why the MDS needs to have all > the users in its own passwd/group file? And what are the implications of > setting the above mdt.group_upcall=NONE on the MDT? > > /Per > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Tue, 12 Feb 2008, Kit Westneat wrote:> What the group upcall does is get all the secondary groups for the client > user. There isn''t enough room in the LNET message to send them all, so the MDS > has to look it up in the /etc/groups. If you don''t care about secondary groups > at all, there is no harm in clearing the group_upcall param. > > In theory, there also shouldn''t be any harm in having different passwd and > group files on the MDS and OSSes than on the clients. It''s highly important, > however, that all the clients have the same passwd and groups files. Otherwise > the clients could interpret the same UID as different users, and people could > go mucking around in each others files.ok, thanks for clarifying this /Per -- Per Lundqvist National Supercomputer Centre Link?ping University, Sweden http://www.nsc.liu.se