Dear list, we got a bad last_transno after a OST is remounted. cat /proc/fs/lustre/obdfilter/besfs-OST0034/recovery_status status: COMPLETE recovery_start: 1265630163 recovery_duration: 466 completed_clients: 298/298 replayed_requests: 0 last_transno: -499056254903072891 Each time after the OST finished recovery, the OSS crashed. With a kernel "Opps error", and reports error about deleting orphan objects. We are running "2.6.9-67.0.22.EL_lustre.1.6.6smp". To avoid OSS crash, we umount the certain OST. Any suggestion to slove this bad transno problem? Best Regards Lu Wang -------------------------------------------------------------- Computing Center IHEP Office: Computing Center,123 19B Yuquan Road Tel: (+86) 10 88236012-607 P.O. Box 918-7 Fax: (+86) 10 8823 6839 Beijing 100049,China Email: Lu.Wang at ihep.ac.cn --------------------------------------------------------------
We set the osc of "besfs-OST0034" on MDS "deactivate". The OSS did not crash again. However, this problem has not been solved. The OST "besfs-OST0034" cannot be written now. ------------------ Lu Wang 2010-02-08 ------------------------------------------------------------- ????Lu Wang ?????2010-02-08 20:47:16 ????lustre-discuss ??? ???[Lustre-discuss] BAD last_transno problem Dear list, we got a bad last_transno after a OST is remounted. cat /proc/fs/lustre/obdfilter/besfs-OST0034/recovery_status status: COMPLETE recovery_start: 1265630163 recovery_duration: 466 completed_clients: 298/298 replayed_requests: 0 last_transno: -499056254903072891 Each time after the OST finished recovery, the OSS crashed. With a kernel "Opps error", and reports error about deleting orphan objects. We are running "2.6.9-67.0.22.EL_lustre.1.6.6smp". To avoid OSS crash, we umount the certain OST. Any suggestion to slove this bad transno problem? Best Regards Lu Wang -------------------------------------------------------------- Computing Center IHEP Office: Computing Center,123 19B Yuquan Road Tel: (+86) 10 88236012-607 P.O. Box 918-7 Fax: (+86) 10 8823 6839 Beijing 100049,China Email: Lu.Wang at ihep.ac.cn -------------------------------------------------------------- _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-02-08, at 06:29, Lu Wang wrote:> We set the osc of "besfs-OST0034" on MDS "deactivate". The OSS did > not crash again. However, this problem has not been solved. The OST > "besfs-OST0034" cannot be written now. > > ------------------ > Lu Wang > 2010-02-08 > > ------------------------------------------------------------- > ????Lu Wang > ?????2010-02-08 20:47:16 > ????lustre-discuss > ??? > ???[Lustre-discuss] BAD last_transno problem > > Dear list, > we got a bad last_transno after a OST is remounted. > cat /proc/fs/lustre/obdfilter/besfs-OST0034/recovery_status > status: COMPLETE > recovery_start: 1265630163 > recovery_duration: 466 completed_clients: 298/298 > replayed_requests: 0 last_transno: -499056254903072891 > > Each time after the OST finished recovery, the OSS crashed. With a > kernel "Opps error", and reports error about deleting orphan objects.Having the actual "oops error" makes commenting on such problems a lot easier. It sounds like running "e2fsck -f" on this OST may avoid the oops, but it won''t fix the transno error. You can mount the OST filesystem as ldiskfs and delete the "last_rcvd" file to clear the transno Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.