Last week we experienced a major hardware failure (disk controller) that brought down our system hard. Now that I have the replacement controller, I want to make sure I recover correctly. Below is the procedure I plan to follow based on what I''ve gathered from the Operations Manual. Any comments? Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs? Thanks! -Joe ###MDT Recovery # Capture fs state before doing anything e2fsck -vfn /dev/$MDTDEV # "safe" repair e2fsck -vfp /dev/$MDTDEV # Verify no more problems and generate mdsdb e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV ###OST Recovery foreach OST # Capture fs state before doing anything e2fsck -vfn /dev/$OSTDEV # "safe" repair e2fsck -vfp /dev/$OSTDEV # Verify no more problems e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV ### Recover lost+found Objects foreach OST mount -t ldiskfs /dev/$OSTDEV /mnt/ost ll_recover_lost_found_objs -v -d /mnt/ost/lost+found ### Coherency Check lfsck -n -v --mdsdb /tmp/mdsdb --ostdb /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre
You should not have to do the lfsck if the initial fsck''s come back clean. cliffw On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio <jgd-lustre at metajoe.com> wrote:> Last week we experienced a major hardware failure (disk controller) > that brought down our system hard. Now that I have the replacement > controller, I want to make sure I recover correctly. Below is the > procedure I plan to follow based on what I''ve gathered from the > Operations Manual. > > Any comments? > Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs? > > Thanks! > -Joe > > > ###MDT Recovery > # Capture fs state before doing anything > e2fsck -vfn /dev/$MDTDEV > # "safe" repair > e2fsck -vfp /dev/$MDTDEV > # Verify no more problems and generate mdsdb > e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV > > ###OST Recovery > foreach OST > # Capture fs state before doing anything > e2fsck -vfn /dev/$OSTDEV > # "safe" repair > e2fsck -vfp /dev/$OSTDEV > # Verify no more problems > e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV > > ### Recover lost+found Objects > foreach OST > mount -t ldiskfs /dev/$OSTDEV /mnt/ost > ll_recover_lost_found_objs -v -d /mnt/ost/lost+found > > ### Coherency Check > lfsck -n -v --mdsdb /tmp/mdsdb --ostdb > /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110207/ebaa95ff/attachment.html
Cliff, thank you for your help so far. Unfortunately, the initial e2fsck''s (-n) of both the MDT and the OSTs did not come back clean. Using "-p", the OSTs cleaned up nicely (in fact, most OST problems went away after the journal was recovered). The MDT had many files dumped to lost+found. When I run "lfsck -d" it never seems to delete the orphans... Subsequent runs show exactly the same orphans. Many lines like this for all OSTs: [0] zero-length orphan objid 1 [0] zero-length orphan objid 960 [0] zero-length orphan objid 992 lfsck: [0]: pass3 orphan found objid 1207392, 6234112 bytes lfsck: [0]: pass3 orphan found objid 1207360, 6234112 bytes Shouldn''t those be deleted when using "-d"? Or am I misunderstanding the documentation? Thanks again! -Joe On Mon, Feb 7, 2011 at 17:00, Cliff White <cliffw at whamcloud.com> wrote:> You should not have to do the lfsck if the initial fsck''s come back clean. > cliffw > > On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio <jgd-lustre at metajoe.com> wrote: >> >> Last week we experienced a major hardware failure (disk controller) >> that brought down our system hard. ?Now that I have the replacement >> controller, I want to make sure I recover correctly. ?Below is the >> procedure I plan to follow based on what I''ve gathered from the >> Operations Manual. >> >> Any comments? >> Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs? >> >> Thanks! >> -Joe >> >> >> ###MDT Recovery >> # Capture fs state before doing anything >> e2fsck -vfn /dev/$MDTDEV >> # "safe" repair >> e2fsck -vfp /dev/$MDTDEV >> # Verify no more problems and generate mdsdb >> e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV >> >> ###OST Recovery >> foreach OST >> ? ?# Capture fs state before doing anything >> ? ?e2fsck -vfn /dev/$OSTDEV >> ? ?# "safe" repair >> ? ?e2fsck -vfp /dev/$OSTDEV >> ? ?# Verify no more problems >> ? ?e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV >> >> ### Recover lost+found Objects >> foreach OST >> ? ?mount -t ldiskfs /dev/$OSTDEV /mnt/ost >> ? ?ll_recover_lost_found_objs -v -d /mnt/ost/lost+found >> >> ### Coherency Check >> lfsck -n -v --mdsdb /tmp/mdsdb --ostdb >> /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >