All, We have a sizeable filesystem and during a hardware upgrade, our MDT disk was completely lost. I am trying to find if and how to recover from such an event, but am not finding anything. We were running lustre 2.3 and have upgraded to 2.4 (or are in the process of it). Can anyone point me in the right direction here? Thanks in advance, Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238
I am not aware of any tool or method to recover from a lost MGT/MDT. Do you have any recent backups of your MDT device? I would hold on to your MDT device with care and see if someone can help you resurrect it. --Jeff On 6/26/13 3:01 PM, Andrus, Brian Contractor wrote:> All, > > We have a sizeable filesystem and during a hardware upgrade, our MDT disk was completely lost. > I am trying to find if and how to recover from such an event, but am not finding anything. > > We were running lustre 2.3 and have upgraded to 2.4 (or are in the process of it). > > Can anyone point me in the right direction here? > > Thanks in advance, > > > Brian Andrus > ITACS/Research Computing > Naval Postgraduate School > Monterey, California > voice: 831-656-6238 > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.johnson-OPZmt/DU+TakJOqCEYON2AC/G2K4zDHf@public.gmane.org www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117
Can you describe the failure in more detail? "Andrus, Brian Contractor" <bdandrus-u6e/tGqFTB8@public.gmane.org> wrote:>All, > >We have a sizeable filesystem and during a hardware upgrade, our MDT disk was completely lost. >I am trying to find if and how to recover from such an event, but am not finding anything. > >We were running lustre 2.3 and have upgraded to 2.4 (or are in the process of it). > >Can anyone point me in the right direction here? > >Thanks in advance, > > >Brian Andrus >ITACS/Research Computing >Naval Postgraduate School >Monterey, California >voice: 831-656-6238 > > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >http://lists.lustre.org/mailman/listinfo/lustre-discuss
Basically, I was adding capacity to a system while doing a fresh install. Turns out /dev/sda which used to be the disk in the bottom slot became the disk in the top slot instead. That happened to be where the MDT was, which was promptly repartitioned and formatted. Not exactly something I was expecting.... Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238> -----Original Message----- > From: Colin Faber [mailto:colin_faber-qCPWdT176rRBDgjK7y7TUQ@public.gmane.org] > Sent: Wednesday, June 26, 2013 5:08 PM > To: Andrus, Brian Contractor > Cc: lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > Subject: Re: [Lustre-discuss] Completely lost MGT/MDT > > Can you describe the failure in more detail? > > "Andrus, Brian Contractor" <bdandrus-u6e/tGqFTB8@public.gmane.org> wrote: > > >All, > > > >We have a sizeable filesystem and during a hardware upgrade, our MDT > disk was completely lost. > >I am trying to find if and how to recover from such an event, but am not > finding anything. > > > >We were running lustre 2.3 and have upgraded to 2.4 (or are in the process > of it). > > > >Can anyone point me in the right direction here? > > > >Thanks in advance, > > > > > >Brian Andrus > >ITACS/Research Computing > >Naval Postgraduate School > >Monterey, California > >voice: 831-656-6238 > > > > > >_______________________________________________ > >Lustre-discuss mailing list > >Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > >http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2013/28/06 3:25 PM, "Andrus, Brian Contractor" <bdandrus-u6e/tGqFTB8@public.gmane.org> wrote:>Basically, I was adding capacity to a system while doing a fresh install. >Turns out /dev/sda which used to be the disk in the bottom slot became >the disk in the top slot instead. >That happened to be where the MDT was, which was promptly repartitioned >and formatted. > >Not exactly something I was expecting....Presumably you have no backups or snapshots of the MDT device? Lustre can handle a lot of inconsistency between the MDT and OSTs, even without running lfsck. Also, there was once a similar situation with a reformatted MDT that was partly recovered using the "ext3grep" utility. This allowed finding the filename->inode mappings in the dirents in directory leaf blocks, and the ".." dirent allowed connecting the parent directories. In Lustre 2.x, the "link" xattr on the MDT inodes could also be used to recover the filenames even if the directory entries are lost. This won''t help as much if the whole disk has been overwritten by an OS install, but if only part of the MDT was overwritten you may be surprised how much is recoverable with ext4. First order is to make a copy of the whole disk before you try any further changes (this lets you try things and restart without losing any data if things go badly). Repartition the disk as it was before (possibly without any partition table at all for Lustre, or it could be dumped into an image file if not too huge). Then build and run the "findsuper" utility from the e2fsprogs code (I''ve attached it here) and try and find any existing (old) superblocks from before the reformat. You can tell superblocks from the same filesystem by the same start/end/blocks and increasing group number: byte_offset byte_start byte_end fs_blocks blksz grp mkfs/mount_time sb_uuid label 1049600 1048576 525336576 512000 1024 0 Wed Sep 12 16:39:47 2012 8f8531a2 9438208 1048576 525336576 512000 1024 1 Wed Sep 12 16:39:47 2012 8f8531a2 26215424 1048576 525336576 512000 1024 3 Wed Sep 12 16:39:47 2012 8f8531a2 42992640 1048576 525336576 512000 1024 5 Wed Sep 12 16:39:47 2012 8f8531a2 59769856 1048576 525336576 512000 1024 7 Wed Sep 12 16:39:47 2012 8f8531a2 76547072 1048576 525336576 512000 1024 9 Wed Sep 12 16:39:47 2012 8f8531a2 135266304 1048576 8590983168 2097152 4096 1 Tue Jan 18 15:06:12 2011 e1e13f16 boot 210764800 1048576 525336576 512000 1024 25 Wed Sep 12 16:39:47 2012 8f8531a2 227542016 1048576 525336576 512000 1024 27 Wed Sep 12 16:39:47 2012 8f8531a2 403701760 1048576 8590983168 2097152 4096 3 Tue Jan 18 15:06:12 2011 e1e13f16 boot 412091392 1048576 525336576 512000 1024 49 Wed Sep 12 16:39:47 2012 8f8531a2 525337600 525336576 9115271168 2097152 4096 0 Tue Jan 18 15:06:12 2011 e1e13f16 root_fc13 659554304 525336576 9115271168 2097152 4096 1 Tue Jan 18 15:06:12 2011 e1e13f16 root_fc13 659750912 525533184 17705402368 4194304 4096 1 Thu Jan 13 14:29:26 2011 6740a155 Then, run "e2fsck -fn -b {block} -B 4096 /dev/XXX" for one of the MDT superblocks (which will clobber the old superblocks. This will potentially recover some of your old MDT filesystem into lost+found, and you can move these into a directory called "ROOT" at the top. Use "getfattr" to extract the filenames from the "link" xattr. Hope this helps. This is one reason why I encourage everyone to make full "dd" backups of their MDT device. It doesn''t take much space, but is critical to the whole filesystem. Cheers, Andreas>> -----Original Message----- >> From: Colin Faber [mailto:colin_faber-qCPWdT176rRBDgjK7y7TUQ@public.gmane.org] >> Sent: Wednesday, June 26, 2013 5:08 PM >> To: Andrus, Brian Contractor >> Cc: lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >> Subject: Re: [Lustre-discuss] Completely lost MGT/MDT >> >> Can you describe the failure in more detail? >> >> "Andrus, Brian Contractor" <bdandrus-u6e/tGqFTB8@public.gmane.org> wrote: >> >> >All, >> > >> >We have a sizeable filesystem and during a hardware upgrade, our MDT >> disk was completely lost. >> >I am trying to find if and how to recover from such an event, but am >>not >> finding anything. >> > >> >We were running lustre 2.3 and have upgraded to 2.4 (or are in the >>process >> of it). >> > >> >Can anyone point me in the right direction here? >> > >> >Thanks in advance, >>Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 06/29/2013 12:24 AM, Dilger, Andreas wrote:> On 2013/28/06 3:25 PM, "Andrus, Brian Contractor" <bdandrus@nps.edu> wrote: > >> Basically, I was adding capacity to a system while doing a fresh install. >> Turns out /dev/sda which used to be the disk in the bottom slot became >> the disk in the top slot instead. >> That happened to be where the MDT was, which was promptly repartitioned >> and formatted. >> >> Not exactly something I was expecting.... > > Presumably you have no backups or snapshots of the MDT device? Lustre can > handle a lot of inconsistency between the MDT and OSTs, even without > running lfsck. > > Also, there was once a similar situation with a reformatted MDT that was > partly recovered using the "ext3grep" utility. This allowed finding the > filename->inode mappings in the dirents in directory leaf blocks, and the > ".." dirent allowed connecting the parent directories. In Lustre 2.x, the > "link" xattr on the MDT inodes could also be used to recover the filenames > even if the directory entries are lost.I guess you are referring to the recovery I did at DDN (*)? Actually, ext3grep didn't do what we needed, so I wrote our own tool. In the mean time Kit did another recovery, further improved the tools and uploaded them to http://code.google.com/p/decode-ost-attr/ http://code.google.com/p/mdt-recovery/ Cheers, Венедикт PS: Sorry, for some reasons I'm using an alias name. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss