Could someone help me understand what the following error messages mean: When I do a "ls -l" I get ? in all the fields but for the filename on some of the files: ?--------- ? ? ? ? ? cart_wengen.odd_even.txt I recently moved the MDT to a different partition. I performed a lfsck on the the file system and got these errors from lfsck: /aa/bb/cc/dd/xx/cosmology.c object 8024 not created I keep getting the following error message on all the OSTs: Mar 27 08:21:57 lustre2 kernel: LustreError: 6886:0:(ldlm_resource.c:858:ldlm_resource_add()) lvbo_init failed for resource 301930: rc -2 TIA Nirmal
On Fri, 2009-03-27 at 09:31 -0500, Nirmal Seenu wrote:> Could someone help me understand what the following error messages mean: > > When I do a "ls -l" I get ? in all the fields but for the filename on > some of the files: > > ?--------- ? ? ? ? ? cart_wengen.odd_even.txtLooks like you have some synchronicity problems between your MDT and your OSTs.> I recently moved the MDT to a different partition.Hrm. How did you do that? Did you lose some information in the move perhaps?> I performed a lfsck > on the the file system and got these errors from lfsck: > > /aa/bb/cc/dd/xx/cosmology.c object 8024 not createdLike just that one or lots of them? I''d think you should see one for that file you reference above: cart_wengen.odd_even.txt In any case, that message means you have metadata but no object data for a file.> I keep getting the following error message on all the OSTs: > Mar 27 08:21:57 lustre2 kernel: LustreError: > 6886:0:(ldlm_resource.c:858:ldlm_resource_add()) lvbo_init failed for > resource 301930: rc -2Yeah. Same thing. A client is asking for an object that doesn''t exist. Assuming you don''t have some ongoing problem with OSTs objects going missing for whatever reason, lfsck should clear all of this up. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090327/27a87b90/attachment.bin
>> I recently moved the MDT to a different partition.> > Hrm. How did you do that? Did you lose some information in the move perhaps? It was different partitions on the same machine. I did the following from a LVM snapshot of the old MDT which got mounted as ldiskfs (tar --sparse -cf - . | ( cd /mnt/new-mdt; tar -xf -)) & and then brought the entire Lustre filesystem down and did a final rsync after mounting both the old and new MDT as ldiskfs: rsync -aSv /mnt/mdt/ /mnt/new-mdt and then I did a getfattr, setfattr and "rm OBJECTS/* CATALOGS" >> I performed a lfsck >> on the the file system and got these errors from lfsck: >> >> /aa/bb/cc/dd/xx/cosmology.c object 8024 not created > Like just that one or lots of them? I''d think you should see one for > that file you reference above: cart_wengen.odd_even.txt cart_wengen.odd_even.txt is a new file that was created by a user after the lustre filesystem was brought online after performing a lfsck. i.e. I don''t have an entry for cart_wengen.odd_even.txt in lfsck output, but the "ls -l" output lists the same file with ?. During lfsck, 6612 files had the error "object not created" On a "ls -lR" output on the lustre client, I have 6774 files that have ? in their output. All our OSTs are healthy RAID6 volumes on SATABeasts and there is no known hardware problem. There are no errors reported When I do a e2fsck against each individual filesystem. Not much I/O has happened since the last time I did a lfsck(last night). Would another lfsck actually help in fixing this problem? Thanks Nirmal
On Fri, 2009-03-27 at 10:21 -0500, Nirmal Seenu wrote:> > It was different partitions on the same machine. I did the following > from a LVM snapshotSo you took a copy of an LVM snapshot of an MDT on a live filesystem and then replaced the MDT that you took the snapshot of with the copy on the new partition?> of the old MDT which got mounted as ldiskfs > > (tar --sparse -cf - . | ( cd /mnt/new-mdt; tar -xf -)) &So you didn''t make a back up of the extended attributes too?> and then brought the entire Lustre filesystem down and did a final rsync > after mounting both the old and new MDT as ldiskfs: > > rsync -aSv /mnt/mdt/ /mnt/new-mdtAgain, no extended attribute copying.> and then I did a getfattr, setfattr and "rm OBJECTS/* CATALOGS"Hrm. Whilte it seems that you might have done the right thing it''s not clear from your brief description that you did the right things. Did you use the process in the operations manual as a model for your process?> cart_wengen.odd_even.txt is a new file that was created by a user after > the lustre filesystem was brought online after performing a lfsck. i.e. > I don''t have an entry for cart_wengen.odd_even.txt in lfsck output, but > the "ls -l" output lists the same file with ?.Well, sounds like objects are getting removed or going missing by some means.> Not much I/O has happened since the last time I did a lfsck(last night). > Would another lfsck actually help in fixing this problem?It''s difficult to tell, as it''s not clear why you are still getting errors after your last lfsck. It certainly could not hurt if your goal is to just get the MDT and OSTs back in sync. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090327/834e4285/attachment.bin
> So you took a copy of an LVM snapshot of an MDT on a live filesystem> and then replaced the MDT that you took the snapshot of with the copy > on the new partition? I made most of the work involved in copying the sparse files on the live system. > Hrm. Whilte it seems that you might have done the right thing it''s > not clear from your brief description that you did the right things. > Did you use the process in the operations manual as a model for your > process? I was following the instruction on your manual on how to backup and restore. I have done a few successful MDT moves before but nothing that was done on the live system as it was done in this case. I have decided to start over on the MDT move and make the move on a filesystem that is not live. I will perform another lfsck on the new MDT once the movew is completed. Thanks. Nirmal