Hoping for a quick sanity check: I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.) <pre rebuild> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib exiting before disk write. <after reformat and tunefs> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib exiting before disk write.
On 2010-05-26, at 13:18, Mervini, Joseph A wrote:> I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. > > I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: > > tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.)The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don''t need tunefs.lustre at all.> <pre rebuild> > > [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib > > > Permanent disk data: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib > > exiting before disk write. > > > <after reformat and tunefs> > > [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x62 > (OST first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib > > > Permanent disk data: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x62 > (OST first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib > > exiting before disk write. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Andreas, I migrated all the files off the target with lfs_migrate. I didn''t realize that I would need to retain any of the ldiskfs data if everything was moved. (I must have misinterpreted your earlier comment.) So this is my current scenario: 1. All data from a failing OST has been migrated to other targets. 2. The original target was recreated via mdadm. 3. mkfs.lustre was run on the recreated target 4. tunefs.lustre was run on the recreated target to set the index to what it was before it was reformatted. 5. No other data from the original target has been retained. Question: Based on the above conditions, what do I need to do to get this OST back into the file system? Thanks in advance. Joe On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:> On 2010-05-26, at 13:18, Mervini, Joseph A wrote: >> I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. >> >> I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: >> >> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.) > > The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don''t need tunefs.lustre at all. > >> <pre rebuild> >> >> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x2 >> (OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >> >> >> Permanent disk data: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x2 >> (OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >> >> exiting before disk write. >> >> >> <after reformat and tunefs> >> >> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x62 >> (OST first_time update ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >> >> >> Permanent disk data: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x62 >> (OST first_time update ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >> >> exiting before disk write. >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > >
On 2010-05-26, at 13:47, Mervini, Joseph A wrote:> I migrated all the files off the target with lfs_migrate. I didn''t realize that I would need to retain any of the ldiskfs data if everything was moved. (I must have misinterpreted your earlier comment.) > > So this is my current scenario: > > 1. All data from a failing OST has been migrated to other targets. > 2. The original target was recreated via mdadm. > 3. mkfs.lustre was run on the recreated target > 4. tunefs.lustre was run on the recreated target to set the index to what it was before it was reformatted. > 5. No other data from the original target has been retained. > > Question: > > Based on the above conditions, what do I need to do to get this OST back into the file system?Lustre is fairly robust about handling situations like this (e.g. recreating the last_rcvd file, the object heirarchy O/0/d{0..31}, etc). The one item that it will need help with is to recreate the LAST_ID file on the OST. You can do this by hand by extracting the last-precreated object from the MDS, and writing the LAST_ID file on the OST: # extract last allocated object for all OSTs mds# debugfs -c -R "dump lov_objids /tmp/lo" # cut out the last allocated object for this OST index mds# dd if=/tmp/lo of=/tmp/LAST_ID bs=8 skip=${OST index NN} count=1 # verify value is the right one (LAST_ID = next_id - 1) mds# lctl get_param osc.*OST00NN.prealloc_next_id # NN is OST index mds# od -td8 /tmp/LAST_ID # get OST filesystem ready for this value ossN# mount -t ldiskfs /dev/{ostdev} /mnt/tmp ossN# mkdir -p /mnt/tmp/O/0 mds# scp /tmp/LAST_ID ossN:/mnt/tmp/O/0/LAST_ID This will avoid the OST trying to recreate thousands/millions of objects when the OST next reconnects. This could probably be handled internally by the OST, by simply bumping the LAST_ID value in the case that it is currently < 2 and the MDS is requesting some large value.> On May 26, 2010, at 1:29 PM, Andreas Dilger wrote: > >> On 2010-05-26, at 13:18, Mervini, Joseph A wrote: >>> I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. >>> >>> I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: >>> >>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.) >> >> The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don''t need tunefs.lustre at all. >> >>> <pre rebuild> >>> >>> [root at oss-scratch obdfilter]# tunefs.lustre /dev/md4 >>> checking for existing Lustre data: found CONFIGS/mountdata >>> Reading CONFIGS/mountdata >>> >>> Read previous values: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x2 >>> (OST ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >>> >>> >>> Permanent disk data: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x2 >>> (OST ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >>> >>> exiting before disk write. >>> >>> >>> <after reformat and tunefs> >>> >>> [root at oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 >>> checking for existing Lustre data: found CONFIGS/mountdata >>> Reading CONFIGS/mountdata >>> >>> Read previous values: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x62 >>> (OST first_time update ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >>> >>> >>> Permanent disk data: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x62 >>> (OST first_time update ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.10.2 at o2ib mgsnode=10.10.10.5 at o2ib failover.node=10.10.10.10 at o2ib >>> >>> exiting before disk write. >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >> > >Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.