thr3ads.net - Lustre discuss - [Lustre-discuss] sanity check [May 2010]

If this information is useful, please help other people find it:
Share via:

Mervini, Joseph A

2010-May-26 19:18 UTC

[Lustre-discuss] sanity check

Hoping for a quick sanity check:

I have migrated all the files that were on a damaged OST and have recreated the
software raid array and put a lustre file system on it.

I am now at the point where I want to re-introduce it to the scratch file system
as if it was never
gone. I used:

tunefs.lustre --index=27 /dev/md4 to get the right index for the file system
(the information is
below). I just want to make sure there is nothing else I need to do before I
pull the trigger will
mounting it. (The things that have me concerned are the differences in the
flags, and less so the
"OST first_time update.)



<pre rebuild>

[root at oss-scratch obdfilter]# tunefs.lustre /dev/md4
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

  Read previous values:
Target:     scratch1-OST001b
Index:      27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:      0x2
             (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib


  Permanent disk data:
Target:     scratch1-OST001b
Index:      27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:      0x2
             (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib

exiting before disk write.


<after reformat and tunefs>

[root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

  Read previous values:
Target:     scratch1-OST001b
Index:      27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:      0x62
             (OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib


  Permanent disk data:
Target:     scratch1-OST001b
Index:      27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:      0x62
             (OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib

exiting before disk write.

Andreas Dilger

2010-May-26 19:29 UTC

head link

[Lustre-discuss] sanity check

On 2010-05-26, at 13:18, Mervini, Joseph A wrote:> I have migrated all the files that were on a damaged OST and have recreated
the software raid array and put a lustre file system on it.
> 
> I am now at the point where I want to re-introduce it to the scratch file
system as if it was never gone. I used:
> 
> tunefs.lustre --index=27 /dev/md4 to get the right index for the file
system (the information is below). I just want to make sure there is nothing
else I need to do before I pull the trigger will mounting it. (The things that
have me concerned are the differences in the flags, and less so the "OST
first_time update.)
The use of tunefs.lustre is not sufficient to make the new OST identical to the
previous one.  You should also copy the O/0/LAST_ID file, last_rcvd, and
mountdata files over, at which point you don''t need tunefs.lustre at
all.
> <pre rebuild>
> 
> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
> 
>  Read previous values:
> Target:     scratch1-OST001b
> Index:      27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:      0x2
>             (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
> 
> 
>  Permanent disk data:
> Target:     scratch1-OST001b
> Index:      27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:      0x2
>             (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
> 
> exiting before disk write.
> 
> 
> <after reformat and tunefs>
> 
> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
> 
>  Read previous values:
> Target:     scratch1-OST001b
> Index:      27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:      0x62
>             (OST first_time update )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
> 
> 
>  Permanent disk data:
> Target:     scratch1-OST001b
> Index:      27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:      0x62
>             (OST first_time update )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
> 
> exiting before disk write.
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Mervini, Joseph A

2010-May-26 19:47 UTC

head link

[Lustre-discuss] sanity check

Andreas,

I migrated all the files off the target with lfs_migrate. I didn''t
realize that I would need to retain any of the ldiskfs data if everything was
moved. (I must have misinterpreted your earlier comment.)

So this is my current scenario:

1. All data from a failing OST has been migrated to other targets.
2. The original target was recreated via mdadm.
3. mkfs.lustre was run on the recreated target
4. tunefs.lustre was run on the recreated target to set the index to what it was
before it was reformatted.
5. No other data from the original target has been retained.

Question:

Based on the above conditions, what do I need to do to get this OST back into
the file system?

Thanks in advance.

Joe
 
On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:
> On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
>> I have migrated all the files that were on a damaged OST and have
recreated the software raid array and put a lustre file system on it.
>> 
>> I am now at the point where I want to re-introduce it to the scratch
file system as if it was never gone. I used:
>> 
>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file
system (the information is below). I just want to make sure there is nothing
else I need to do before I pull the trigger will mounting it. (The things that
have me concerned are the differences in the flags, and less so the "OST
first_time update.)
> 
> The use of tunefs.lustre is not sufficient to make the new OST identical to
the previous one.  You should also copy the O/0/LAST_ID file, last_rcvd, and
mountdata files over, at which point you don''t need tunefs.lustre at
all.
> 
>> <pre rebuild>
>> 
>> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4
>> checking for existing Lustre data: found CONFIGS/mountdata
>> Reading CONFIGS/mountdata
>> 
>> Read previous values:
>> Target:     scratch1-OST001b
>> Index:      27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:      0x2
>>            (OST )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>> 
>> 
>> Permanent disk data:
>> Target:     scratch1-OST001b
>> Index:      27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:      0x2
>>            (OST )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>> 
>> exiting before disk write.
>> 
>> 
>> <after reformat and tunefs>
>> 
>> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
>> checking for existing Lustre data: found CONFIGS/mountdata
>> Reading CONFIGS/mountdata
>> 
>> Read previous values:
>> Target:     scratch1-OST001b
>> Index:      27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:      0x62
>>            (OST first_time update )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>> 
>> 
>> Permanent disk data:
>> Target:     scratch1-OST001b
>> Index:      27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:      0x62
>>            (OST first_time update )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>> 
>> exiting before disk write.
>> 
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
> 
>

Andreas Dilger

2010-May-26 21:37 UTC

head link

[Lustre-discuss] sanity check

On 2010-05-26, at 13:47, Mervini, Joseph A wrote:> I migrated all the files off the target with lfs_migrate. I didn''t
realize that I would need to retain any of the ldiskfs data if everything was
moved. (I must have misinterpreted your earlier comment.)
> 
> So this is my current scenario:
> 
> 1. All data from a failing OST has been migrated to other targets.
> 2. The original target was recreated via mdadm.
> 3. mkfs.lustre was run on the recreated target
> 4. tunefs.lustre was run on the recreated target to set the index to what
it was before it was reformatted.
> 5. No other data from the original target has been retained.
> 
> Question:
> 
> Based on the above conditions, what do I need to do to get this OST back
into the file system?
Lustre is fairly robust about handling situations like this (e.g. recreating the
last_rcvd file, the object heirarchy O/0/d{0..31}, etc).  The one item that it
will need help with is to recreate the LAST_ID file on the OST.  You can do this
by hand by extracting the last-precreated object from the MDS, and writing the
LAST_ID file on the OST:

# extract last allocated object for all OSTs
mds# debugfs -c -R "dump lov_objids /tmp/lo"
# cut out the last allocated object for this OST index
mds# dd if=/tmp/lo of=/tmp/LAST_ID bs=8 skip=${OST index NN} count=1
# verify value is the right one (LAST_ID = next_id - 1)
mds# lctl get_param osc.*OST00NN.prealloc_next_id  # NN is OST index
mds# od -td8 /tmp/LAST_ID
# get OST filesystem ready for this value
ossN# mount -t ldiskfs /dev/{ostdev} /mnt/tmp
ossN# mkdir -p /mnt/tmp/O/0
mds# scp /tmp/LAST_ID ossN:/mnt/tmp/O/0/LAST_ID

This will avoid the OST trying to recreate thousands/millions of objects when
the OST next reconnects.

This could probably be handled internally by the OST, by simply bumping the
LAST_ID value in the case that it is currently < 2 and the MDS is requesting
some large value.
> On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:
> 
>> On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
>>> I have migrated all the files that were on a damaged OST and have
recreated the software raid array and put a lustre file system on it.
>>> 
>>> I am now at the point where I want to re-introduce it to the
scratch file system as if it was never gone. I used:
>>> 
>>> tunefs.lustre --index=27 /dev/md4 to get the right index for the
file system (the information is below). I just want to make sure there is
nothing else I need to do before I pull the trigger will mounting it. (The
things that have me concerned are the differences in the flags, and less so the
"OST first_time update.)
>> 
>> The use of tunefs.lustre is not sufficient to make the new OST
identical to the previous one.  You should also copy the O/0/LAST_ID file,
last_rcvd, and mountdata files over, at which point you don''t need
tunefs.lustre at all.
>> 
>>> <pre rebuild>
>>> 
>>> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>> 
>>> Read previous values:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x2
>>>           (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>>> 
>>> 
>>> Permanent disk data:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x2
>>>           (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>>> 
>>> exiting before disk write.
>>> 
>>> 
>>> <after reformat and tunefs>
>>> 
>>> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>> 
>>> Read previous values:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x62
>>>           (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>>> 
>>> 
>>> Permanent disk data:
>>> Target:     scratch1-OST001b
>>> Index:      27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:      0x62
>>>           (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib
failover.node=10.10.10.10 at o2ib
>>> 
>>> exiting before disk write.
>>> 
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> 
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Lustre discuss - May 2010 - sanity check

[Lustre-discuss] sanity check

[Lustre-discuss] sanity check

[Lustre-discuss] sanity check

[Lustre-discuss] sanity check