We had an LBUG on our MDS (on 15th Feb) and so attempted a failover to the 2nd MGS/MDS server. This mounted the MGT fine but hung while mounting the MDT (longer than 5 minutes). To resolve the problem I unmounted the MGT and the MDT on a freshly booted MDS/MGS and mounted the MDT as ldiskfs. Then moved aside the CATALOGS, OBJECTS and last_rcvd files/dirs, unmounted and restarted lustre (mount -t lustre ....) This brought the file system back ok but one of our scientists appears to have lost an entire directory of data from the time the file system was taken down. The MDS was initally taken out at 1400 (16 Feb) and the file system was fully back around 1500. The scientist has files in the directory from 1400 onwards. Approximately 4000 small files dating from the start of January are missing. We are running 1.6.6 with a patched kernel 2.6.18-92.1.10.el5 on the servers, the client is running an unreleased patchless RH kernel 2.6.18-171.el5 and 1.6.7.2 lustre modules. We should have good backups of our metadata and we also have access to the removed ldiskfs files which were simply renamed. The missing files have fairly predictable names which might help tracking down the content? Is there any hope of recovering the missing files/directory? GREG -- Greg Matthews 01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: messages.tmp Url: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100217/098aa30d/attachment.pl