thr3ads.net - Lustre discuss - [Lustre-discuss] How do I recover files from partial lustre disk? [Jun 2008]

If this information is useful, please help other people find it:
Share via:

megan

2008-Jun-16 22:37 UTC

[Lustre-discuss] How do I recover files from partial lustre disk?

Greetings!

I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on a
CentOS 5 linux x86_64 linux box.
We had a hardware problem that caused the underlying ext3 partition
table to completely blow up.  This is resulting in only three of five
OSTs being mountable.   The main lustre disk of this unit cannot be
mounted because the MDS knows that two of its parts are missing.
The underlying set-up is JBOD hw that is passed to the linux OS, via
an LSI 8888ELP card in this case, as a simple device, ie. sde,
sdf,...    The simple devices were partitioned using parted and
formatted ext3 then lustre was built on top of the five ext3 units.
There was no striping done across units/JBODS.   Three of the five
units passed an e2fsck and an lfsck.  Those remaining units are
mounted as such:
/dev/sdc               13T  6.3T  5.7T  53% /srv/lustre/OST/crew4-
OST0003
/dev/sdd               13T  6.3T  5.7T  53% /srv/lustre/OST/crew4-
OST0004
/dev/sdf               13T  6.2T  5.8T  52% /srv/lustre/OST/crew4-
OST0001

Being that it is unlikely that we shall be able to recover the
underlying ext3 on the other two units, is there some method by which
I might try to rescue the data from these last three units mounted
currently on the OSS?

Any and all suggestion genuinely appreciated.

megan

Andreas Dilger

2008-Jun-18 04:48 UTC

head link

[Lustre-discuss] How do I recover files from partial lustre disk?

On Jun 16, 2008  15:37 -0700, megan wrote:> I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on a
> CentOS 5 linux x86_64 linux box.
> We had a hardware problem that caused the underlying ext3 partition
> table to completely blow up.  This is resulting in only three of five
> OSTs being mountable.   The main lustre disk of this unit cannot be
> mounted because the MDS knows that two of its parts are missing.
It should be possible to mount a Lustre filesystem with OSTs that
are not available.  However, access to files on the unavailable
OSTs will cause the process to wait on OST recovery.
> The underlying set-up is JBOD hw that is passed to the linux OS, via
> an LSI 8888ELP card in this case, as a simple device, ie. sde,
> sdf,...    The simple devices were partitioned using parted and
> formatted ext3 then lustre was built on top of the five ext3 units.
> There was no striping done across units/JBODS.   Three of the five
> units passed an e2fsck and an lfsck.  Those remaining units are
> mounted as such:
> /dev/sdc               13T  6.3T  5.7T  53% /srv/lustre/OST/crew4-
> OST0003
> /dev/sdd               13T  6.3T  5.7T  53% /srv/lustre/OST/crew4-
> OST0004
> /dev/sdf               13T  6.2T  5.8T  52% /srv/lustre/OST/crew4-
> OST0001
> 
> Being that it is unlikely that we shall be able to recover the
> underlying ext3 on the other two units, is there some method by which
> I might try to rescue the data from these last three units mounted
> currently on the OSS?
> 
> Any and all suggestion genuinely appreciated.
The recoverability of your data depends heavily on the striping of
the individual files (i.e. the default striping).  If your files have
a default stripe_count = 1, then you can probably recover 3/5 of the
files in the filesystem.  If your default stripe_count = 2, then you
can probably only recover 1/5 of the files, and if you have a higher
stripe_count you probably can''t recover any files.

What you need to do is to mount one of the clients and mark the
corresponding OSTs inactive with:

	lctl dl    # get device numbers for OSC 0000 and OSC 0002
	lctl --device N deactivate

Then, instead of the clients waiting for the OSTs to recover the
client will get an IO error when it accesses files on the failed OSTs.

To get a list of the files that are on the good OSTs run:

	lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID
		 --ost crew4-OST0004_UUID {mountpoint}



Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

megan

2008-Jun-18 21:33 UTC

head link

[Lustre-discuss] How do I recover files from partial lustre disk?

Thank you Andreas!

Your information is wonderful.  I did the following:

I logged into my MDS (same as MGS) and issued the commands--

shell-prompt> mount -t lustre /dev/md1 /srv/lustre/mds/crew4-MDT0000

No errors so far.

shell-prompt> lctl
   dl                                (Found my nids of failed JBODs)
   device 14
   deactivate

   device 16
   deactivate

   quit

On one of our servers, I mounted the lustre disk /crew4.
The disk will hang a UNIX df or ls command.

However....
lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4-
OST0004_UUID -print /crew4

Did indeed provide a list of files.   I saved the list to a text
file.   I will next see if I am able to copy a single file to a new
location.

Thank you again Andreas for this incredibly useful information.   Do
you/Sun do paid Lustre consulting by any chance?

Later,
megan




On Jun 18, 12:48?am, Andreas Dilger <adil... at sun.com>
wrote:> On Jun 16, 2008 ?15:37 -0700, megan wrote:
>
> > I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on a
> > CentOS 5 linux x86_64 linux box.
> > We had a hardware problem that caused the underlying ext3 partition
> > table to completely blow up. ?This is resulting in only three of five
> > OSTs being mountable. ? The main lustre disk of this unit cannot be
> > mounted because the MDS knows that two of its parts are missing.
>
> It should be possible to mount a Lustre filesystem with OSTs that
> are not available. ?However, access to files on the unavailable
> OSTs will cause the process to wait on OST recovery.
>
>
>
> > The underlying set-up is JBOD hw that is passed to the linux OS, via
> > an LSI 8888ELP card in this case, as a simple device, ie. sde,
> > sdf,... ? ?The simple devices were partitioned using parted and
> > formatted ext3 then lustre was built on top of the five ext3 units.
> > There was no striping done across units/JBODS. ? Three of the five
> > units passed an e2fsck and an lfsck. ?Those remaining units are
> > mounted as such:
> > /dev/sdc ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53% /srv/lustre/OST/crew4-
> > OST0003
> > /dev/sdd ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53% /srv/lustre/OST/crew4-
> > OST0004
> > /dev/sdf ? ? ? ? ? ? ? 13T ?6.2T ?5.8T ?52% /srv/lustre/OST/crew4-
> > OST0001
>
> > Being that it is unlikely that we shall be able to recover the
> > underlying ext3 on the other two units, is there some method by which
> > I might try to rescue the data from these last three units mounted
> > currently on the OSS?
>
> > Any and all suggestion genuinely appreciated.
>
> The recoverability of your data depends heavily on the striping of
> the individual files (i.e. the default striping). ?If your files have
> a default stripe_count = 1, then you can probably recover 3/5 of the
> files in the filesystem. ?If your default stripe_count = 2, then you
> can probably only recover 1/5 of the files, and if you have a higher
> stripe_count you probably can''t recover any files.
>
> What you need to do is to mount one of the clients and mark the
> corresponding OSTs inactive with:
>
> ? ? ? ? lctl dl ? ?# get device numbers for OSC 0000 and OSC 0002
> ? ? ? ? lctl --device N deactivate
>
> Then, instead of the clients waiting for the OSTs to recover the
> client will get an IO error when it accesses files on the failed OSTs.
>
> To get a list of the files that are on the good OSTs run:
>
> ? ? ? ? lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID
> ? ? ? ? ? ? ? ? ?--ost crew4-OST0004_UUID {mountpoint}
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2008-Jun-19 05:31 UTC

head link

[Lustre-discuss] How do I recover files from partial lustre disk?

On Jun 18, 2008  14:33 -0700, megan wrote:> shell-prompt> mount -t lustre /dev/md1 /srv/lustre/mds/crew4-MDT0000
> 
> No errors so far.
> 
> shell-prompt> lctl
>    dl                                (Found my nids of failed JBODs)
>    device 14
>    deactivate
> 
>    device 16
>    deactivate
> 
>    quit
> 
> On one of our servers, I mounted the lustre disk /crew4.
> The disk will hang a UNIX df or ls command.
You actually need to do the "deactivate" step on the client.  Then
"ls" will get EIO on the file, and "df" will return data
only from
the available OSTs.
> However....
> lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4-
> OST0004_UUID -print /crew4
> 
> Did indeed provide a list of files.   I saved the list to a text
> file.   I will next see if I am able to copy a single file to a new
> location.
> 
> Thank you again Andreas for this incredibly useful information.   Do
> you/Sun do paid Lustre consulting by any chance?
Yes, in fact we do...
> On Jun 18, 12:48?am, Andreas Dilger <adil... at sun.com> wrote:
> > On Jun 16, 2008 ?15:37 -0700, megan wrote:
> >
> > > I am using Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp kernel on
a
> > > CentOS 5 linux x86_64 linux box.
> > > We had a hardware problem that caused the underlying ext3
partition
> > > table to completely blow up. ?This is resulting in only three of
five
> > > OSTs being mountable. ? The main lustre disk of this unit cannot
be
> > > mounted because the MDS knows that two of its parts are missing.
> >
> > It should be possible to mount a Lustre filesystem with OSTs that
> > are not available. ?However, access to files on the unavailable
> > OSTs will cause the process to wait on OST recovery.
> >
> >
> >
> > > The underlying set-up is JBOD hw that is passed to the linux OS,
via
> > > an LSI 8888ELP card in this case, as a simple device, ie. sde,
> > > sdf,... ? ?The simple devices were partitioned using parted and
> > > formatted ext3 then lustre was built on top of the five ext3
units.
> > > There was no striping done across units/JBODS. ? Three of the
five
> > > units passed an e2fsck and an lfsck. ?Those remaining units are
> > > mounted as such:
> > > /dev/sdc ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53%
/srv/lustre/OST/crew4-
> > > OST0003
> > > /dev/sdd ? ? ? ? ? ? ? 13T ?6.3T ?5.7T ?53%
/srv/lustre/OST/crew4-
> > > OST0004
> > > /dev/sdf ? ? ? ? ? ? ? 13T ?6.2T ?5.8T ?52%
/srv/lustre/OST/crew4-
> > > OST0001
> >
> > > Being that it is unlikely that we shall be able to recover the
> > > underlying ext3 on the other two units, is there some method by
which
> > > I might try to rescue the data from these last three units
mounted
> > > currently on the OSS?
> >
> > > Any and all suggestion genuinely appreciated.
> >
> > The recoverability of your data depends heavily on the striping of
> > the individual files (i.e. the default striping). ?If your files have
> > a default stripe_count = 1, then you can probably recover 3/5 of the
> > files in the filesystem. ?If your default stripe_count = 2, then you
> > can probably only recover 1/5 of the files, and if you have a higher
> > stripe_count you probably can''t recover any files.
> >
> > What you need to do is to mount one of the clients and mark the
> > corresponding OSTs inactive with:
> >
> > ? ? ? ? lctl dl ? ?# get device numbers for OSC 0000 and OSC 0002
> > ? ? ? ? lctl --device N deactivate
> >
> > Then, instead of the clients waiting for the OSTs to recover the
> > client will get an IO error when it accesses files on the failed OSTs.
> >
> > To get a list of the files that are on the good OSTs run:
> >
> > ? ? ? ? lfs find --ost crew4-OST0001_UUID --ost crew4-OST0003_UUID
> > ? ? ? ? ? ? ? ? ?--ost crew4-OST0004_UUID {mountpoint}
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Charles Taylor

2008-Jun-19 10:42 UTC

head link

[Lustre-discuss] How do I recover files from partial lustre disk?

>>


Just some feedback on the item below...When we were getting started  
with Lustre about seven or eight months ago we were having stability  
problems and losing a lot of time to Lustre-related issues.   We were  
newbies and willing to pay someone to help us clear some initial  
hurdles.   I contacted (phone and email) ClusterFS (I guess this was  
just prior to the Sun purchase, I don''t know) to ask about some  
consulting help and never heard back from anyone.   We all want you to  
be successful and this seems like a lost revenue opportunity.

FWIW, we are doing pretty well now and are mostly very happy with  
Lustre and this list has been invaluable (to wit, "The Dilger  
Procedure").  However, I''m sure there is plenty that we still
don''t
know and we plan to attend some training at the next opportunity.

Regards,

Charlie Taylor
UF HPC Center

>> Thank you again Andreas for this incredibly useful information.   Do
>> you/Sun do paid Lustre consulting by any chance?
>
> Yes, in fact we do...
>
>>

Ms. Megan Larko

2008-Jun-20 21:27 UTC

head link

[Lustre-discuss] How do I recover files from partial lustre disk?

This is a follow-up from Megan on 20 June 2008:
Success getting file information from remaining OST''s.

Per the advice of Andreas, I mounted my good OST''s on my OSS.
I went to the MDT and mounted the /srv/lustre/mds/crew4-MDT0000.

On a compute  node (not a lustre data OSS node), I mounted the disk
(/crew4) and then I used the lctl to identify the known bad nids in
/crew4 and then to "device {bad-nid}   then "deactivate"  that
bad-nid.   Finally I used Andreas suggestion of "lfs find --ost
crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4-OST0004_UUID
--print  /crew4 >& crew4.find.20Jun08"

I received a 759 Mb text output file of the names of files still
resident on the remaining OST''s.   (...and there was great rejoicing!)
  So--- I want to cp those known/found file names from the read-only
mounted device named /crew4 onto some good space.   May I just use a
linux system "cp" command or is there a better lustre command that
should be used for this specific task?

Thanks bunches!
megan



On Thu, Jun 19, 2008 at 12:52 PM, Ms. Megan Larko <dobsonunit at
gmail.com> wrote:> Howdy,
>
> WRT consulting, our experience at CREW (my company) is similar.   Our
> company president contacted Sun about Lustre consulting and the Sun
> people to whom he spoke knew nothing about it.   We are still
> interested.    I have signed up for the Lustre class to be taught at
> Sun in San Jose California on July 15-17, 2008.   I am still learning
> how to set-up and manage my lustre files system.
>
> Our company has also purchased and received (yesterday) seven Xstore
> 16 bay JBODs and 110 Hitachi Ultrastar 1Tb sATA hard drives to add to
> our current lustre system.  We do use InfiniBand.   I don''t know
if
> the current system (I inherited) used quotas.  We have two new servers
> coming for the new disk space.   I am following the new lustre 1.6
> release thread with great interest as I believe that is what I will
> put onto the new servers to serve the new disk space (lustre format)
> we have just purchased.
>
> Can''t get to that lustre class soon enough.
>
> megan
>
> On Thu, Jun 19, 2008 at 6:42 AM, Charles Taylor <taylor at
hpc.ufl.edu> wrote:
>>>>
>>
>>
>>
>> Just some feedback on the item below...When we were getting started
with
>> Lustre about seven or eight months ago we were having stability
problems and
>> losing a lot of time to Lustre-related issues.   We were newbies and
willing
>> to pay someone to help us clear some initial hurdles.   I contacted
(phone
>> and email) ClusterFS (I guess this was just prior to the Sun purchase,
I
>> don''t know) to ask about some consulting help and never heard
back from
>> anyone.   We all want you to be successful and this seems like a lost
>> revenue opportunity.
>>
>> FWIW, we are doing pretty well now and are mostly very happy with
Lustre and
>> this list has been invaluable (to wit, "The Dilger
Procedure").  However,
>> I''m sure there is plenty that we still don''t know and
we plan to attend some
>> training at the next opportunity.
>>
>> Regards,
>>
>> Charlie Taylor
>> UF HPC Center
>>
>>
>>>> Thank you again Andreas for this incredibly useful information.
Do
>>>> you/Sun do paid Lustre consulting by any chance?
>>>
>>> Yes, in fact we do...
>>>
>>>>
>>
>

Andreas Dilger

2008-Jun-23 20:54 UTC

head link

[Lustre-discuss] How do I recover files from partial lustre disk?

On Jun 20, 2008  17:27 -0400, Ms. Megan Larko wrote:> This is a follow-up from Megan on 20 June 2008:
> Success getting file information from remaining OST''s.
> 
> Per the advice of Andreas, I mounted my good OST''s on my OSS.
> I went to the MDT and mounted the /srv/lustre/mds/crew4-MDT0000.
> 
> On a compute  node (not a lustre data OSS node), I mounted the disk
> (/crew4) and then I used the lctl to identify the known bad nids in
> /crew4 and then to "device {bad-nid}   then "deactivate" 
that
> bad-nid.   Finally I used Andreas suggestion of "lfs find --ost
> crew4-OST0001_UUID --ost crew4-OST0003_UUID --ost crew4-OST0004_UUID
> --print  /crew4 >& crew4.find.20Jun08"
> 
> I received a 759 Mb text output file of the names of files still
> resident on the remaining OST''s.   (...and there was great
rejoicing!)
>   So--- I want to cp those known/found file names from the read-only
> mounted device named /crew4 onto some good space.   May I just use a
> linux system "cp" command or is there a better lustre command
that
> should be used for this specific task?
If the files are single-stripe files then "cp" is fine.  If the files
have multiple stripes (you can check with "lfs getstripe filename
...")
then you should probably just skip them.

If there is data in a striped file that is valuable even if you only
have e.g. every other 1MB of the file, then you can recover the readable
parts of the file with:

    COUNT=$(($(stat -c {filename}) + 65535) / 65536))
    dd if={filename} of={savefilename} bs=64k count=$COUNT conv=sync,noerror

the unreadable parts of the file will be filled with binary 0 (NUL) bytes.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Jun 2008 - How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?

[Lustre-discuss] How do I recover files from partial lustre disk?