Dear All, Sorry that we encoutered a complicated situation of OST in our Lustre-1.8.7 filesystem. The story is the following. After an event of unexpected electric power crash, we cannot mount all the five Lustre OST partitions located in one file server. The hardware looks ok. But it seems that the last_rcvd and mountdata files for each partition were broken, since when we run: /opt/lustre/sbin/tunefs.lustre --writeconf /dev/sdb1 It returns: ========================================================================checking for existing Lustre data: found last_rcvd tunefs.lustre: Unable to read 1.8 config /tmp/dirMSzCCL/mountdata. Trying 1.4 config from last_rcvd Reading last_rcvd Feature compat=fffa5a5a, incompat=fffa5a5a Read previous values: Target: Index: -370086 UUID: ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ?? Lustre FS: lustre Mount type: ldiskfs Flags: 0x202 (OST upgrade1.4 ) Persistent mount opts: Parameters: tunefs.lustre FATAL: Must specify --mgsnodetunefs.lustre: exiting with 22 (Invalid argument) ======================================================================== We tried several ways to fix this problem: 1. Copy the correct mountdata file from the other OST which is without problem, to the broken OST CONFIGS/mountdata (where both the broken OST and health OST were mounted with the ldiskfs). Then use: xxd /mnt/broken_OST/CONFIGS/mountdata /tmp/mountdata.edit to edit the OST number to the broken one, and convert it back. 2. We have no way to fix the broken last_rcvd file, so we just backup it and and delete it from the broken OST. In this way we can mount the broken OST, we can "ls" files through the Lustre client. But when running "df" in the Lustre client, the client halted. Then we unmount the whole Lustre filesystem, and run: /opt/lustre/sbin/tunefs.lustre --writeconf /dev/sdb1 again on the broken OST. The same problem still remains. Then we run "e2fsck" command on the broken OST (it completed its work without problem), and try this command: debugfs -c -R ''ls /O/0/'' /dev/sdb1 the error message is: debugfs 1.40-WIP (14-Nov-2006) /dev/sdb1: catastrophic mode - not reading inode or group bitmaps /O/0/: EXT2 directory corrupted So we guess, fundmentally, the backend EXT3 or ldiskfs filesystem still has something wrong. No matter how many times we run the "e2fsck" command on it, the problem still remains. Then we decide to backup the data of OST and recreate the OST. But I think at this point I did something serious wrong. Here are my steps: 1. Unmount the whole Lustre filesystem. 2. Run the command to disable the broken OST: lctl conf_param foo-OST000a.osc.active=0 (because we know that it can be recovered later with the command: lctl conf_param foo-OST000a.osc.active=1 3. Run the command: tunefs.lustre --writeconf /dev/XXX for all the MDT, OST, except the broken OST. 4. Backup the data of the broken OST in its ldiskfs filesystem. 5. Reformat the broken OST to be a new OST: mkfs.lustre --fsname foo --ost --mgsnode=IPADDR /dev/sdb1 6. Mount the new OST as ldiskfs, and restore its data. The broken last_rcvd is removed, and the broken mountdata is created in the way I mentioned above. We hope that this way the correct last_rcvd can be regenerated, and the new OST can be mounted successfully. However, I got this error message: =============================================================================mount.lustre: mount /dev/sdb1 at /cfs/cwarp_ost2 failed: No such device or address The target service failed to start (bad config log?) (/dev/sdc1). See /var/log/messages. ============================================================================= So I think I had did a bad wrong thing on step 2 and 3, which removed all information of the broken OST out of the MDT. So, may I ask that whether there is other way to retrive the data from the broken OST ? Now what we have are: 1. All the data trees in MDT: ROOT/, with probably correct extended attributes. 2. All the data in broken OST: O/0/*. We don''t mine to write scripts or simple code to retrive the data and backup them, and then recreate the OST and copy data back. Besides, I will be very appriciated if anyone can comment on my above stupid procedures. I know that I have many incorrect concepts for each steps (such as: tunefs.lustre --writeconf /dev/XXX), which leads to my wrong treatment. Thanks very mcuh for your help. Best Regards, T.H.Hsieh