Hi, everyone, I need some help in figuring out what may have happened here, as newly created files on an OST are being corrupted. I don''t know if this applies to all files written to this OST, or just to files of order 2GB size, but files are definitely being corrupted, with no errors reported by the OSS machine. Let me describe the situation. We had been running Lustre 1.8.4 for several years. With our upgrade from SL5 to SL6.4 we also switched to Lustre 2.1.6. The OST were left "as is", with no reformatting. A few weeks ago, we began to throw IO errors on one of the OST on one OSS. This was almost certainly related to an ill-performed replacement of a failed disk in the RAID-5 volume. e2fsck did not help, so the OST was set RO, and drained using lfs_migrate. When the drain was complete, the following sequence of commands was used to reformat and remount the volume. This procedure was successfully used under Lustre 1.8.4. This is a 9-disk, RAID-5, 5.5TB volume, on a Dell MD1000 shelf using a PERC-6 controller. A second 5-disk RAID-5 shares the shelf, with the 15th disk as a hot spare, and that second volume is not having issues. 289 mkdir reformat 290 cd reformat 292 mkdir -p /mnt/ost 293 mount -t ldiskfs /dev/sdc /mnt/ost 294 mkdir sdc 295 pushd /mnt/ost 296 cp -p last_rcvd /root/reformat/sdc 297 cd O 298 cd 0 299 cp -p LAST_ID /root/reformat/sdc 300 cd ../.. 301 cp -p CONFIGS/* /root/reformat/sdc 304 umount /mnt/ost At this point, the web interface of Dell''s OMSA was used to do a complete, slow initialization of the volume. No further action was taken until that process completed. The index, inode count, and stripe are taken from the files above (not shown in this email) when the volumes were first created. 309 mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0 --fsname=umt3 --reformat --index=35 \ --mkfsoptions="-i 2000000" --reformat --mountfsoptions="errors=remount-ro,extents,mballoc,stripe=256" /dev/sdc The UUID here is taken from the /etc/fstab, where the entry has been commented out until we are ready to again use the volume 310 tune2fs -O uninit_bg -m 1 -U 02bcb3d2-ad48-4992-ba71-7b48787defea /dev/sdc 311 e2fsck -fy /dev/sdc 312 mount -t ldiskfs /dev/sdc /mnt/ost Copy back all identifiers so that the volume can continue from where it was left off 315 cd /root/reformat/sdc 316 cp -v /mnt/ost/CONFIGS/mountdata mountdata.new2 317 cp -fv mountdata /mnt/ost/CONFIGS 319 cp last_rcvd /mnt/ost 320 mkdir -p /mnt/ost/O/0 321 chmod 700 /mnt/ost/O 322 chmod 700 /mnt/ost/O/0 323 cp -fv LAST_ID /mnt/ost/O/0 324 umount /mnt/ost Add the fstab entry back in again, and remount the disk 325 vi /etc/fstab 326 mount -av I was quite pleasantly surprised by the speed of the reformatted volume when I used lfs_migrate to repopulate it from the file list previously removed. The volume seemed fine. Then a user reported that his newly created files (a gridftp variant was used), that were written to this volume, were corrupted, whereas if they copied to a different volume, then they were just fine. "md5sum" shows they are, indeed, different despite showing the same size via "ls". Can anyone tell me what may have gone wrong here? Is there something I need to have done, but did not? Where should I begin to look? Neither the client, nor the OSS, logged any kind of error for the volume during this time. I am truly at a loss here. All help is appreciated. Thanks much, bob