We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of having these on-line, one OST stopped accepting new files. I cannot get it to activate. The other 5 seem fine. On the MDS "lctl dl" shows it IN, but not UP, and files can be read from it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5 However, I cannot get it to re-activate: lctl --device umt3-OST001d-osc activate This returns no errors, but dmesg on the MDS shows this as a result: [603128.578862] Lustre: umt3-OST001d-osc: Connection restored to service umt3-OST001d using nid 10.10.2.23 at tcp. [603128.578865] Lustre: Skipped 1 previous similar message [603128.579251] Lustre: MDS umt3-MDT0000: umt3-OST001d_UUID now active, resetting orphans [603128.579256] Lustre: Skipped 1 previous similar message [603128.579608] LustreError: 9655:0:(osc_create.c:589:osc_create()) umt3-OST001d-osc: oscc recovery failed: -22 [603128.579616] LustreError: 9655:0:(lov_obd.c:1134:lov_clear_orphans()) error in orphan recovery on OST idx 29/34: rc = -22 [603128.579623] LustreError: 9655:0:(mds_lov.c:1057:__mds_lov_synchronize()) umt3-OST001d_UUID failed at mds_lov_clear_orphans: -22 [603128.579628] LustreError: 9655:0:(mds_lov.c:1066:__mds_lov_synchronize()) umt3-OST001d_UUID sync failed -22, deactivating On the OSS itself, I see these related entries appear: Lustre: 4642:0:(ldlm_lib.c:572:target_handle_reconnect()) umt3-OST001d: umt3-mdtlov_UUID reconnecting Lustre: 4642:0:(ldlm_lib.c:572:target_handle_reconnect()) Skipped 1 previous similar message Lustre: umt3-OST001d: received MDS connection from 10.10.1.49 at tcp Lustre: Skipped 1 previous similar message LustreError: 4697:0:(filter.c:3172:filter_handle_precreate()) umt3-OST001d: ignoring bogus orphan destroy request: obdid 11309489156331498430 last_id 0 Can anyone tell me what must be done to recover this disk volume? Thanks, bob
On Friday, September 03, 2010, Bob Ball wrote:> We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of > 8.9TB each. Within a day of having these on-line, one OST stopped > accepting new files. I cannot get it to activate. The other 5 seem fine. > > On the MDS "lctl dl" shows it IN, but not UP, and files can be read from > it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5 > > However, I cannot get it to re-activate: > lctl --device umt3-OST001d-osc activate >[...]> LustreError: 4697:0:(filter.c:3172:filter_handle_precreate()) > umt3-OST001d: ignoring bogus orphan destroy request: obdid > 11309489156331498430 last_id 0 > > Can anyone tell me what must be done to recover this disk volume?Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on an OST). It is on my TODO list to write tool to automatically correct the "lov_objid", but as of now I don''t have it yet. Somehow your lov_objid file has a completely wrong value for this OST. Now, when you say "files can be read from it", are you sure there are already files on that OST? Because the error message says that the last_id is zero and so you should not have a single file on it. If that is also wrong, you will need to correct it as well. You can do that manually, or you can use a patched e2fsprogs version, that will do that for you Patches are here: https://bugzilla.lustre.org/show_bug.cgi?id=22734 Packages can be found on my home page: http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ If you want to do it automatically, you will need to create a lfsck mdsdb file (the hdr file is sufficient, see the lfsck section in the manual) and then you will need to run e2fsck for that OST as if you want to create an OSTDB file. That will start pass6, and if you then run e2fsck *without* "-n", so in correcting mode, it will correct the LAST_ID file to what it finds on disk. With "-v" it will also tell you the old and the new value and then you will need to put that value properly coded into the MDS lov_objid file. Be careful and create backups of the lov_objid and LAST_ID files. Hope it helps, Bern -- Bernd Schubert DataDirect Networks
On Friday, September 03, 2010, Bernd Schubert wrote:> On Friday, September 03, 2010, Bob Ball wrote: > > We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of > > 8.9TB each. Within a day of having these on-line, one OST stopped > > accepting new files. I cannot get it to activate. The other 5 seem > > fine. > > > > On the MDS "lctl dl" shows it IN, but not UP, and files can be read from > > it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5 > > > > However, I cannot get it to re-activate: > > lctl --device umt3-OST001d-osc activate > > [...] > > > LustreError: 4697:0:(filter.c:3172:filter_handle_precreate()) > > umt3-OST001d: ignoring bogus orphan destroy request: obdid > > 11309489156331498430 last_id 0 > > > > Can anyone tell me what must be done to recover this disk volume? > > Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on > an OST). > > It is on my TODO list to write tool to automatically correct the > "lov_objid", but as of now I don''t have it yet. Somehow your lov_objid > file has a completely wrong value for this OST. > Now, when you say "files can be read from it", are you sure there are > already files on that OST? Because the error message says that the last_id > is zero and so you should not have a single file on it. If that is also > wrong, you will need to correct it as well. You can do that manually, or > you can use a patched e2fsprogs version, that will do that for you > > Patches are here: > https://bugzilla.lustre.org/show_bug.cgi?id=22734 > > Packages can be found on my home page: > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ > > > If you want to do it automatically, you will need to create a lfsck mdsdb > file (the hdr file is sufficient, see the lfsck section in the manual) and > then you will need to run e2fsck for that OST as if you want to create an > OSTDB file. That will start pass6, and if you then run e2fsck *without* > "-n", so in correcting mode, it will correct the LAST_ID file to what it > finds on disk. With "-v" it will also tell you the old and the new value > and then you will need to put that value properly coded into the MDS > lov_objid file.Update for the lov_objd file, actually, if you rename or delete it (rename it please, so that you have a backup), the MDS should be able to re-create it from OST LAST_ID data. So if the troublesome OST has no data yet, it will be very easy, if it already has data, you will need to correct the LAST_ID on that OST first. Cheers, Bernd -- Bernd Schubert DataDirect Networks
Thank you, Bern. "df" claims there is some 442MB of data on the volume, compared to neighbors with 285GB. That could well be a fragment of a single, unsuccessful transfer attempt. I can run lfs_find on it though and see what comes back. Was having problems earlier, thought I got files back from that command, but other problems on our cluster confused that result. We will recheck. bob Bernd Schubert wrote: On Friday, September 03, 2010, Bob Ball wrote: We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of having these on-line, one OST stopped accepting new files. I cannot get it to activate. The other 5 seem fine. On the MDS "lctl dl" shows it IN, but not UP, and files can be read from it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5 However, I cannot get it to re-activate: lctl --device umt3-OST001d-osc activate [...] LustreError: 4697:0:(filter.c:3172:filter_handle_precreate()) umt3-OST001d: ignoring bogus orphan destroy request: obdid 11309489156331498430 last_id 0 Can anyone tell me what must be done to recover this disk volume? Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on an OST). It is on my TODO list to write tool to automatically correct the "lov_objid", but as of now I don''t have it yet. Somehow your lov_objid file has a completely wrong value for this OST. Now, when you say "files can be read from it", are you sure there are already files on that OST? Because the error message says that the last_id is zero and so you should not have a single file on it. If that is also wrong, you will need to correct it as well. You can do that manually, or you can use a patched e2fsprogs version, that will do that for you Patches are here: https://bugzilla.lustre.org/show_bug.cgi?id=22734 Packages can be found on my home page: http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ If you want to do it automatically, you will need to create a lfsck mdsdb file (the hdr file is sufficient, see the lfsck section in the manual) and then you will need to run e2fsck for that OST as if you want to create an OSTDB file. That will start pass6, and if you then run e2fsck *without* "-n", so in correcting mode, it will correct the LAST_ID file to what it finds on disk. With "-v" it will also tell you the old and the new value and then you will need to put that value properly coded into the MDS lov_objid file. Be careful and create backups of the lov_objid and LAST_ID files. Hope it helps, Bern _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
OK, I tried this morning to follow the information/procedures from section 23.3.9 of the user manual, and succeeded in confusing myself admirably. Took lustre completely offline, then first checked the LAST_ID on a known, good OST. I found this kind of thing: # od -Ax -td4 /mnt/ost/last_rcvd | more For ost11, the 8c index value is 28 debugfs -c -R ''dump /O/0/LAST_ID /tmp/LAST_ID'' /dev/sdb ; od -Ax -td8 /tmp/LAST_ID debugfs 1.41.10.sun2 (24-Feb-2010) /dev/sdb: catastrophic mode - not reading inode or group bitmaps 000000 68321 000008 (gdb) p /x 68321 $1 = 0x10ae1 (gdb) [root@umdist03 ~]# cat /tmp/LAST_ID.asc 0000000: e10a 0100 0000 0000 á....... >From this I can see how to edit the LAST_ID.asc for use in the repair procedure. All other information was consistent, the /tmp/objects.sdb ended with this LAST_ID object. Now, we move on to the "bad" OST. First, I did an lfs_find yesterday on just this OST and came up with some 8000 files before it seemed to cease output. So, I expected to see SOMETHING on the physical disk. But, in fact, the /tmp/objects.sdc showed no content whatsoever? Just blank lines, and a direct look at the ls output confirmed that. And so, the confusion began. LAST_ID is, indeed, zero. I am checking with my co-conspirator in this. It is _possible_ that this OST ID was re-used after a machine was dropped from our system due to non-recoverable disk/system errors. So, in my mind, that means it is possible that the current OST really IS empty of content. Is that really what is meant by getting this kind of output from the ls command for all 9 or 10 of these directories? /mnt/ost/O/0/d8: total 0 /mnt/ost/O/0/d9: total 0 Assuming the disk really is empty then, and LAST_ID really is zero, shall I then leave it at zero, and follow the recommendation of page 23-14, ie, just shut down again, delete the lov_objid file on the MDS, and restart the system? Certainly the value at the correct index (29) is definitely hosed: # od -Ax -td8 /mnt/mdt/lov_objid (snip) 0000d0 292648 346413 0000e0 68225 -7137254917378053186 0000f0 59064 59607 000100 59227 59414 Thanks, bob Bob Ball wrote: Thank you, Bern. "df" claims there is some 442MB of data on the volume, compared to neighbors with 285GB. That could well be a fragment of a single, unsuccessful transfer attempt. I can run lfs_find on it though and see what comes back. Was having problems earlier, thought I got files back from that command, but other problems on our cluster confused that result. We will recheck. bob Bernd Schubert wrote: On Friday, September 03, 2010, Bob Ball wrote: We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of 8.9TB each. Within a day of having these on-line, one OST stopped accepting new files. I cannot get it to activate. The other 5 seem fine. On the MDS "lctl dl" shows it IN, but not UP, and files can be read from it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5 However, I cannot get it to re-activate: lctl --device umt3-OST001d-osc activate [...] LustreError: 4697:0:(filter.c:3172:filter_handle_precreate()) umt3-OST001d: ignoring bogus orphan destroy request: obdid 11309489156331498430 last_id 0 Can anyone tell me what must be done to recover this disk volume? Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on an OST). It is on my TODO list to write tool to automatically correct the "lov_objid", but as of now I don''t have it yet. Somehow your lov_objid file has a completely wrong value for this OST. Now, when you say "files can be read from it", are you sure there are already files on that OST? Because the error message says that the last_id is zero and so you should not have a single file on it. If that is also wrong, you will need to correct it as well. You can do that manually, or you can use a patched e2fsprogs version, that will do that for you Patches are here: https://bugzilla.lustre.org/show_bug.cgi?id=22734 Packages can be found on my home page: http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ If you want to do it automatically, you will need to create a lfsck mdsdb file (the hdr file is sufficient, see the lfsck section in the manual) and then you will need to run e2fsck for that OST as if you want to create an OSTDB file. That will start pass6, and if you then run e2fsck *without* "-n", so in correcting mode, it will correct the LAST_ID file to what it finds on disk. With "-v" it will also tell you the old and the new value and then you will need to put that value properly coded into the MDS lov_objid file. Be careful and create backups of the lov_objid and LAST_ID files. Hope it helps, Bern _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-09-10, at 08:21, Bob Ball wrote:> Now, we move on to the "bad" OST. First, I did an lfs_find yesterday on just this OST and came up with some 8000 files before it seemed to cease output. So, I expected to see SOMETHING on the physical disk. But, in fact, the /tmp/objects.sdc showed no content whatsoever? Just blank lines, and a direct look at the ls output confirmed that. And so, the confusion began. > > LAST_ID is, indeed, zero. > > I am checking with my co-conspirator in this. It is _possible_ that this OST ID was re-used after a machine was dropped from our system due to non-recoverable disk/system errors. So, in my mind, that means it is possible that the current OST really IS empty of content. Is that really what is meant by getting this kind of output from the ls command for all 9 or 10 of these directories? > > /mnt/ost/O/0/d8: > total 0 > > /mnt/ost/O/0/d9: > total 0There should be 32 such directories.> Assuming the disk really is empty then, and LAST_ID really is zero, shall I then leave it at zero, and follow the recommendation of page 23-14, ie, just shut down again, delete the lov_objid file on the MDS, and restart the system? Certainly the value at the correct index (29) is definitely hosed: > # od -Ax -td8 /mnt/mdt/lov_objid > (snip) > 0000d0 292648 346413 > 0000e0 68225 -7137254917378053186 > 0000f0 59064 59607 > 000100 59227 59414Yes, that is definitely hosed. Deleting the lov_objid file from the MDS and remounting the MDS should fix this value. You could also just binary edit the file and set this to 1.> Bob Ball wrote: >> Thank you, Bern. "df" claims there is some 442MB of data on the volume, compared to neighbors with 285GB. That could well be a fragment of a single, unsuccessful transfer attempt. I can run lfs_find on it though and see what comes back. Was having problems earlier, thought I got files back from that command, but other problems on our cluster confused that result. We will recheck. >> >> bob >> >> Bernd Schubert wrote: >>> On Friday, September 03, 2010, Bob Ball wrote: >>> >>> >>>> We added a new OSS to our 1.8.4 Lustre installation. It has 6 OST of >>>> 8.9TB each. Within a day of having these on-line, one OST stopped >>>> accepting new files. I cannot get it to activate. The other 5 seem fine. >>>> >>>> On the MDS "lctl dl" shows it IN, but not UP, and files can be read from >>>> it: 33 IN osc umt3-OST001d-osc umt3-mdtlov_UUID 5 >>>> >>>> However, I cannot get it to re-activate: >>>> lctl --device umt3-OST001d-osc activate >>>> >>>> >>>> >>> >>> [...] >>> >>> >>> >>> >>>> LustreError: 4697:0:(filter.c:3172:filter_handle_precreate()) >>>> umt3-OST001d: ignoring bogus orphan destroy request: obdid >>>> 11309489156331498430 last_id 0 >>>> >>>> Can anyone tell me what must be done to recover this disk volume? >>>> >>>> >>> >>> Check out section 23.3.9 in the Lustre manual ("How to Fix a Bad LAST_ID on an >>> OST). >>> >>> It is on my TODO list to write tool to automatically correct the "lov_objid", >>> but as of now I don''t have it yet. Somehow your lov_objid file has a >>> completely wrong value for this OST. >>> Now, when you say "files can be read from it", are you sure there are already >>> files on that OST? Because the error message says that the last_id is zero and >>> so you should not have a single file on it. If that is also wrong, you will >>> need to correct it as well. You can do that manually, or you can use a patched >>> e2fsprogs version, that will do that for you >>> >>> Patches are here: >>> >>> https://bugzilla.lustre.org/show_bug.cgi?id=22734 >>> >>> >>> Packages can be found on my home page: >>> >>> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/e2fsprogs/ >>> >>> >>> >>> If you want to do it automatically, you will need to create a lfsck mdsdb file >>> (the hdr file is sufficient, see the lfsck section in the manual) and then you >>> will need to run e2fsck for that OST as if you want to create an OSTDB file. >>> That will start pass6, and if you then run e2fsck *without* "-n", so in >>> correcting mode, it will correct the LAST_ID file to what it finds on disk. >>> With "-v" it will also tell you the old and the new value and then you will >>> need to put that value properly coded into the MDS lov_objid file. >>> >>> >>> Be careful and create backups of the lov_objid and LAST_ID files. >>> >>> >>> Hope it helps, >>> Bern >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
>> Assuming the disk really is empty then, and LAST_ID really is zero, >> shall I then leave it at zero, and follow the recommendation of >> page 23-14, ie, just shut down again, delete the lov_objid file on >> the MDS, and restart the system? Certainly the value at the >> correct index (29) is definitely hosed: # od -Ax -td8 >> /mnt/mdt/lov_objid (snip) 0000d0 292648 >> 346413 0000e0 68225 -7137254917378053186 0000f0 >> 59064 59607 000100 59227 >> 59414 > > Yes, that is definitely hosed. Deleting the lov_objid file from the > MDS and remounting the MDS should fix this value. You could also > just binary edit the file and set this to 1.Andreas, Bob, please be very very careful with lov_objid. As I already wrote last week, I get reproducibly and always a hard kernel panic when I tested and deleted the file and then mounted the MDT again. You can try it, but DO CREATE A BACKUP of this file, so that you can copy it back, if something goes wrong. Sorry, I don''t have the time right now to work on the lob_objid-delete-bug, not even time to write a suitable bug report :( Cheers, Bernd -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100910/dca3bf69/attachment.bin
I just made some random checks on the "lfs find" output for this OST from yesterday. Each file I checked was one lost when we had problems a few months back. The suggested "unlink" on these did not work in 1.8.3, worked fine on a whole set yesterday with 1.8.4, but I obviously did not find them all. So, I am going to assume that this OST is completely empty. I will make a backup of the lov_objid file, then see if I can do a binary edit using xxd, hopefully avoiding the kernel panic. Crossing my fingers. Will announce a short outage here to begin in 45 minutes from now. bob Bernd Schubert wrote: Assuming the disk really is empty then, and LAST_ID really is zero, shall I then leave it at zero, and follow the recommendation of page 23-14, ie, just shut down again, delete the lov_objid file on the MDS, and restart the system? Certainly the value at the correct index (29) is definitely hosed: # od -Ax -td8 /mnt/mdt/lov_objid (snip) 0000d0 292648 346413 0000e0 68225 -7137254917378053186 0000f0 59064 59607 000100 59227 59414 Yes, that is definitely hosed. Deleting the lov_objid file from the MDS and remounting the MDS should fix this value. You could also just binary edit the file and set this to 1. Andreas, Bob, please be very very careful with lov_objid. As I already wrote last week, I get reproducibly and always a hard kernel panic when I tested and deleted the file and then mounted the MDT again. You can try it, but DO CREATE A BACKUP of this file, so that you can copy it back, if something goes wrong. Sorry, I don''t have the time right now to work on the lob_objid-delete-bug, not even time to write a suitable bug report :( Cheers, Bernd _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
OK, this worked. I was able to rewrite the LAST_ID value stored in the lov_objid object to a value of 1, and when lustre came up, the ost was back at "UP" (yay!). However, there still seem to be problems with that ost, as lfs_find comes up with files there that do not exist. I guess at some point we''ll have to take a complete outage to fix the file system consistency. But in the meantime thank you all for your help and advice. I must say though, I like this 1.8.4 version much better than 1.8.3. We were even able to migrate live between versions to do the upgrade, so no down-time was involved. bob Bob Ball wrote: I just made some random checks on the "lfs find" output for this OST from yesterday. Each file I checked was one lost when we had problems a few months back. The suggested "unlink" on these did not work in 1.8.3, worked fine on a whole set yesterday with 1.8.4, but I obviously did not find them all. So, I am going to assume that this OST is completely empty. I will make a backup of the lov_objid file, then see if I can do a binary edit using xxd, hopefully avoiding the kernel panic. Crossing my fingers. Will announce a short outage here to begin in 45 minutes from now. bob Bernd Schubert wrote: Assuming the disk really is empty then, and LAST_ID really is zero, shall I then leave it at zero, and follow the recommendation of page 23-14, ie, just shut down again, delete the lov_objid file on the MDS, and restart the system? Certainly the value at the correct index (29) is definitely hosed: # od -Ax -td8 /mnt/mdt/lov_objid (snip) 0000d0 292648 346413 0000e0 68225 -7137254917378053186 0000f0 59064 59607 000100 59227 59414 Yes, that is definitely hosed. Deleting the lov_objid file from the MDS and remounting the MDS should fix this value. You could also just binary edit the file and set this to 1. Andreas, Bob, please be very very careful with lov_objid. As I already wrote last week, I get reproducibly and always a hard kernel panic when I tested and deleted the file and then mounted the MDT again. You can try it, but DO CREATE A BACKUP of this file, so that you can copy it back, if something goes wrong. Sorry, I don''t have the time right now to work on the lob_objid-delete-bug, not even time to write a suitable bug report :( Cheers, Bernd _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss