Klaus Steden
2008-May-17  02:05 UTC
[Lustre-discuss] Help reviving a 1.4.x volume with a destroyed OST
Hello there, We had a bit of an accident in one of our labs earlier today, and it effectively destroyed one of the OSTs in the Lustre file system. From what I can figure (I wasn''t there at the time), one of the OSSes re-provisioned itself accidentally, and installed its OS information on one of the OSTs in the cluster. So now we''ve got a file system with 16 OSTs, one of which is actually a regular Linux OS install. We''re not quite so worried about the data that''s been lost, but it would be good to bring the file system back online with the hole in place to inspect it for damage, and then subsequently reformat the damaged piece and re-insert it into the existing file system. I''ve tried doing an ''lctl --inactive <UUID> config.xml'' on the OSS in question, but it always errors out. I can''t pull the UUID off the disk itself presumably because it was destroyed when the disk was rewritten. From the config.xml, the UUIDs all look pretty generic -- ''ost2_UUID'', ''ost7_UUID'', etc. -- but if I use ''blkid'' on any of the corresponding LUNs, I get strings that resemble actual real-world UUIDs. Is there any place I can extract the previously-generated-and-now-sadly-destroyed UUID for the damaged OST? Is the generic-looking UUID field in the XML file an actual UUID? When it comes time to re-insert the OST in question back into the file system, is it simply a matter of adding it the same way as adding a new OST, or will I have to remove information about the previous OST if I want to replace it inline? I looked through the manual and Google fairly extensively, but I couldn''t quite find the information I was looking for. Any help would be greatly appreciated! thanks, Klaus