Gonçalo Borges
2009-Jan-12 18:21 UTC
[Lustre-discuss] LustreError: The mds_getxattr operation failed with -43
Hi All... I''m having the following problems: - It seems my clients are not able to reach my mdt. If you do a "dmesg" in a client linux machine, you will get: ---*--- LustreError: 11-0: an error occurred while communicating with 172.30.1.209 at tcp. The mds_getxattr operation failed with -43 LustreError: Skipped 1 previous similar message LustreError: 2472:0:(dir.c:388:ll_readdir()) error reading dir 6422529/2008980280 page 0: rc -43 LustreError: 2623:0:(dir.c:388:ll_readdir()) error reading dir 6422529/2008980280 page 0: rc -43 ---*--- Probably, as a consequence, sometimes some of the users can not list directory contents. Check the following test: [cmsprd081 at srm01 ~]$ ll /lustre/lip.pt/data/cms/store/unmerged/SAM/testSRM/SAM-srm01.lip.pt/lcg-util ls: /lustre/lip.pt/data/cms/store/unmerged/SAM/testSRM/SAM-srm01.lip.pt/lcg-util: Identifier removed [cmsprd081 at srm01 ~]$ exit logout [root at srm01 ~]# ll /lustre/lip.pt/data/cms/store/unmerged/SAM/testSRM/SAM-srm01.lip.pt/lcg-util total 0 -rw-rwx---+ 1 storm storm 0 Jan 12 16:05 testfile-cp-20090112-170527.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-cp-20090112-180358.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:05 testfile-cp-CMS_DEFAULT-20090112-170527.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-cp-CMS_DEFAULT-20090112-180358.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-gt-20090112-170618.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:05 testfile-gt-20090112-180445.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-gt-CMS_DEFAULT-20090112-170618.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-gt-CMS_DEFAULT-20090112-180445.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-gt-rm-gt-20090112-170557.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-gt-rm-gt-20090112-180427.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:05 testfile-gt-rm-gt-CMS_DEFAULT-20090112-170557.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-gt-rm-gt-CMS_DEFAULT-20090112-180427.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-ls-20090112-170627.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:05 testfile-ls-20090112-180516.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-ls-CMS_DEFAULT-20090112-170627.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:05 testfile-ls-CMS_DEFAULT-20090112-180516.txt [root at srm01 ~]# su -l cmsprd081 [cmsprd081 at srm01 ~]$ ll /lustre/lip.pt/data/cms/store/unmerged/SAM/testSRM/SAM-srm01.lip.pt/lcg-util total 0 -rw-rwx---+ 1 storm storm 0 Jan 12 16:05 testfile-cp-20090112-170527.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-cp-20090112-180358.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:05 testfile-cp-CMS_DEFAULT-20090112-170527.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-cp-CMS_DEFAULT-20090112-180358.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-gt-20090112-170618.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:05 testfile-gt-20090112-180445.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-gt-CMS_DEFAULT-20090112-170618.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-gt-CMS_DEFAULT-20090112-180445.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-gt-rm-gt-20090112-170557.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-gt-rm-gt-20090112-180427.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:05 testfile-gt-rm-gt-CMS_DEFAULT-20090112-170557.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:04 testfile-gt-rm-gt-CMS_DEFAULT-20090112-180427.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-ls-20090112-170627.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:05 testfile-ls-20090112-180516.txt -rw-rwx---+ 1 storm storm 0 Jan 12 16:06 testfile-ls-CMS_DEFAULT-20090112-170627.txt -rw-rwx---+ 1 storm storm 0 Jan 12 17:05 testfile-ls-CMS_DEFAULT-20090112-180516.txt [cmsprd081 at srm01 ~]$ In summary, the first time the cmsprd081 user tried to list the directory, he got a "Identifier removed" error. The second time, he was able to do it... I need suggestions here, or things to try... Thanks in Advance Gon?alo
Andreas Dilger
2009-Jan-13 10:41 UTC
[Lustre-discuss] LustreError: The mds_getxattr operation failed with -43
On Jan 12, 2009 18:21 +0000, Gon?alo Borges wrote:> - It seems my clients are not able to reach my mdt. If you do a "dmesg" > in a client linux machine, you will get: > > ---*--- > LustreError: 11-0: an error occurred while communicating with > 172.30.1.209 at tcp. The mds_getxattr operation failed with -43 > LustreError: Skipped 1 previous similar message > LustreError: 2472:0:(dir.c:388:ll_readdir()) error reading dir > 6422529/2008980280 page 0: rc -43 > LustreError: 2623:0:(dir.c:388:ll_readdir()) error reading dir > 6422529/2008980280 page 0: rc -43 > ---*--- > > Probably, as a consequence, sometimes some of the users can not list > directory contents. Check the following test:You need to configure l_getgroups - please search docs/archives for details. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.