I got this error today when testing a newly set up 1.6 filesystem:
n50 1% cd /mnt/test
n50 2% ls
ls: reading directory .: Identifier removed
n50 3% ls -alrt
total 8
?--------- ? ? ? ? ? dir1
?--------- ? ? ? ? ? dir2
drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../
drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./
n50 4% stat .
File: `.''
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: b438c888h/-1271347064d Inode: 27616681 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: ( 500/ nsc)
Access: 2008-02-11 16:11:48.336621154 +0100
Modify: 2008-02-11 15:11:27.000000000 +0100
Change: 2008-02-11 15:11:31.352841294 +0100
this seems to be happen almost all the time when I am running as a
specific user on this system. Note that the stat call always works... I
haven''t yet been able to reproduce this problem when running as my own
user.
dmesg from client:
LustreError: 9000:0:(dir.c:406:ll_readdir()) error reading dir
27583921/2381382571 page 0: rc -43
LustreError: 9019:0:(dir.c:406:ll_readdir()) error reading dir
27583921/2381382571 page 0: rc -43
LustreError: 9020:0:(dir.c:406:ll_readdir()) error reading dir
27583921/2381382571 page 0: rc -43
LustreError: 9021:0:(dir.c:406:ll_readdir()) error reading dir
27583921/2381382571 page 0: rc -43
LustreError: 9022:0:(dir.c:406:ll_readdir()) error reading dir
4848481/4054352687 page 0: rc -43
LustreError: 9127:0:(file.c:2413:ll_inode_revalidate_fini()) failure -43
inode 27616681
LustreError: 9128:0:(file.c:2413:ll_inode_revalidate_fini()) failure -43
inode 27616681
LustreError: 9129:0:(file.c:2413:ll_inode_revalidate_fini()) failure -43
inode 27616681
...
where error 43 means: Identifier removed.
No error messages from the MDS or OSS:s.
setup:
Client: 2.6.9-55.0.9.EL_lustre.1.6.3smp (rhel4)
1 x MDS: 2.6.18-8.1.14.el5_lustre.1.6.4.2smp (rhel5)
4 x OSS with 2 OST:s each: 2.6.18-8.1.14.el5_lustre.1.6.4.2smp (rhel5)
thanks,
Per Lundqvist
--
Per Lundqvist
National Supercomputer Centre
Link?ping University, Sweden
http://www.nsc.liu.se
On Feb 11, 2008 17:04 +0100, Per Lundqvist wrote:> I got this error today when testing a newly set up 1.6 filesystem: > > n50 1% cd /mnt/test > n50 2% ls > ls: reading directory .: Identifier removed > > n50 3% ls -alrt > total 8 > ?--------- ? ? ? ? ? dir1 > ?--------- ? ? ? ? ? dir2 > drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ > drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ > > n50 4% stat . > File: `.'' > Size: 4096 Blocks: 8 IO Block: 4096 directory > Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 > Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: ( 500/ nsc) > Access: 2008-02-11 16:11:48.336621154 +0100 > Modify: 2008-02-11 15:11:27.000000000 +0100 > Change: 2008-02-11 15:11:31.352841294 +0100 > > this seems to be happen almost all the time when I am running as a > specific user on this system. Note that the stat call always works... I > haven''t yet been able to reproduce this problem when running as my own > user.EIDRM (Identifier removed) means that your MDS has a user database (/etc/passwd and /etc/group) that is missing the particular user ID. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Is this an error one would see on orphaned files with stat, ls -l, etc? Klaus ----- Original Message ----- From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org> To: Per Lundqvist <perl at nsc.liu.se> Cc: Lustre Discuss <lustre-discuss at lists.lustre.org> Sent: Mon Feb 11 13:11:45 2008 Subject: Re: [Lustre-discuss] rc -43: Identifier removed On Feb 11, 2008 17:04 +0100, Per Lundqvist wrote:> I got this error today when testing a newly set up 1.6 filesystem: > > n50 1% cd /mnt/test > n50 2% ls > ls: reading directory .: Identifier removed > > n50 3% ls -alrt > total 8 > ?--------- ? ? ? ? ? dir1 > ?--------- ? ? ? ? ? dir2 > drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ > drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ > > n50 4% stat . > File: `.'' > Size: 4096 Blocks: 8 IO Block: 4096 directory > Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 > Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: ( 500/ nsc) > Access: 2008-02-11 16:11:48.336621154 +0100 > Modify: 2008-02-11 15:11:27.000000000 +0100 > Change: 2008-02-11 15:11:31.352841294 +0100 > > this seems to be happen almost all the time when I am running as a > specific user on this system. Note that the stat call always works... I > haven''t yet been able to reproduce this problem when running as my own > user.EIDRM (Identifier removed) means that your MDS has a user database (/etc/passwd and /etc/group) that is missing the particular user ID. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
I had the same issue with my lustre setup. I think this should fix it -- tunefs.lustre --param mdt.group_upcall=NONE /dev/mdt/device On Feb 11, 2008, at 7:18 PM, Steden Klaus wrote:> > Is this an error one would see on orphaned files with stat, ls -l, > etc? > > Klaus > > > ----- Original Message ----- > From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org > > > To: Per Lundqvist <perl at nsc.liu.se> > Cc: Lustre Discuss <lustre-discuss at lists.lustre.org> > Sent: Mon Feb 11 13:11:45 2008 > Subject: Re: [Lustre-discuss] rc -43: Identifier removed > > On Feb 11, 2008 17:04 +0100, Per Lundqvist wrote: >> I got this error today when testing a newly set up 1.6 filesystem: >> >> n50 1% cd /mnt/test >> n50 2% ls >> ls: reading directory .: Identifier removed >> >> n50 3% ls -alrt >> total 8 >> ?--------- ? ? ? ? ? dir1 >> ?--------- ? ? ? ? ? dir2 >> drwxr-xr-x 4 root root 4096 Feb 8 15:46 ../ >> drwxr-xr-x 4 root root 4096 Feb 11 15:11 ./ >> >> n50 4% stat . >> File: `.'' >> Size: 4096 Blocks: 8 IO Block: 4096 >> directory >> Device: b438c888h/-1271347064d Inode: 27616681 Links: 2 >> Access: (0755/drwxr-xr-x) Uid: ( 1120/ faxen) Gid: >> ( 500/ nsc) >> Access: 2008-02-11 16:11:48.336621154 +0100 >> Modify: 2008-02-11 15:11:27.000000000 +0100 >> Change: 2008-02-11 15:11:31.352841294 +0100 >> >> this seems to be happen almost all the time when I am running as a >> specific user on this system. Note that the stat call always >> works... I >> haven''t yet been able to reproduce this problem when running as my >> own >> user. > > EIDRM (Identifier removed) means that your MDS has a user database > (/etc/passwd and /etc/group) that is missing the particular user ID. > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
On Mon, 11 Feb 2008, Aaron Knister wrote:> I had the same issue with my lustre setup. I think this should fix it -- > > tunefs.lustre --param mdt.group_upcall=NONE /dev/mdt/deviceThanks Andreas and Aaron, but then I wonder why the MDS needs to have all the users in its own passwd/group file? And what are the implications of setting the above mdt.group_upcall=NONE on the MDT? /Per -- Per Lundqvist National Supercomputer Centre Link?ping University, Sweden http://www.nsc.liu.se
What the group upcall does is get all the secondary groups for the client user. There isn''t enough room in the LNET message to send them all, so the MDS has to look it up in the /etc/groups. If you don''t care about secondary groups at all, there is no harm in clearing the group_upcall param. In theory, there also shouldn''t be any harm in having different passwd and group files on the MDS and OSSes than on the clients. It''s highly important, however, that all the clients have the same passwd and groups files. Otherwise the clients could interpret the same UID as different users, and people could go mucking around in each others files. - Kit Per Lundqvist wrote:> On Mon, 11 Feb 2008, Aaron Knister wrote: > > >> I had the same issue with my lustre setup. I think this should fix it -- >> >> tunefs.lustre --param mdt.group_upcall=NONE /dev/mdt/device >> > > Thanks Andreas and Aaron, but then I wonder why the MDS needs to have all > the users in its own passwd/group file? And what are the implications of > setting the above mdt.group_upcall=NONE on the MDT? > > /Per > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Tue, 12 Feb 2008, Kit Westneat wrote:> What the group upcall does is get all the secondary groups for the client > user. There isn''t enough room in the LNET message to send them all, so the MDS > has to look it up in the /etc/groups. If you don''t care about secondary groups > at all, there is no harm in clearing the group_upcall param. > > In theory, there also shouldn''t be any harm in having different passwd and > group files on the MDS and OSSes than on the clients. It''s highly important, > however, that all the clients have the same passwd and groups files. Otherwise > the clients could interpret the same UID as different users, and people could > go mucking around in each others files.ok, thanks for clarifying this /Per -- Per Lundqvist National Supercomputer Centre Link?ping University, Sweden http://www.nsc.liu.se