hi,all We experienced a serious raid problem and OST on the RAID corrupted, it could not be mounted. Dmesg showed message as bellow when I tryed to mount it as ldiskfs, LDISKFS-fs error (device sdd): ldiskfs_check_descriptors: Checksum for group 14208 failed (51136!=40578) LDISKFS-fs: group descriptors corrupted! Then I tryed to repair it using e2fsck but entering a endless loop, e2fsck never stop! And I couldn''t mount it as ldiskfs after I sent kill signal to e2fsck. I also tryed some advice found on list, like "tune2fs -O ununit_bg /dev/xxx, then e2fsck", but none could be helpfull. Our lustre version is 1.8.1.1 with e2fsprogs-1.41.10.sun2 Can Mr Andreas Dilger give me some advice? Any help will be greatly appreciated. Thanks! Best Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110918/f0aa69b0/attachment.html
8TB LUN is a big device, I have ever had a 3TB OST device error, whose e2fsck consumed about 10 hours, with Lustre 1.8.1.1 and e2fsprogs-1.41.10.sun2. Maybe you can first upgrade your e2fsprogs? 2011/9/19 enqiang zhou <eqzhou at gmail.com>:> It''s a 8TB LUN and e2fsck have lasted for about 30 hours.I''m not sure > if I should wait more enough time for the end.Bellow is part of > e2fsck''s log. > > ... ... > Illegal block number passed to ext2fs_test_block_bitmap #4243581855 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #2363489791 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #4091539423 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #3989622221 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #1339682798 > for multiply claimed block map > Pass 1C: Scanning directories for inodes with multiply-claimed blocks > 19:06 > Pass 1D: Reconciling multiply-claimed blocks > > ... (inode #423776, mod time Tue Oct 6 18:10:49 1981) > ... (inode #405328, mod time Tue Oct 6 18:10:49 1981) > ... (inode #366432, mod time Tue Oct 6 18:10:49 1981) > ... (inode #349536, mod time Tue Oct 6 18:10:49 1981) > ... (inode #329824, mod time Tue Oct 6 18:10:49 1981) > ... (inode #312928, mod time Tue Oct 6 18:10:49 1981) > ... (inode #275296, mod time Tue Oct 6 18:10:49 1981) > ... (inode #238688, mod time Tue Oct 6 18:10:49 1981) > ... (inode #223056, mod time Tue Oct 6 18:10:49 1981) > ... (inode #220768, mod time Tue Oct 6 18:10:49 1981) > ... (inode #201056, mod time Tue Oct 6 18:10:49 1981) > ... (inode #184160, mod time Tue Oct 6 18:10:49 1981) > ... (inode #164448, mod time Tue Oct 6 18:10:49 1981) > ... (inode #146528, mod time Tue Oct 6 18:10:49 1981) > ... (inode #126816, mod time Tue Oct 6 18:10:49 1981) > ... (inode #109920, mod time Tue Oct 6 18:10:49 1981) > ... (inode #90208, mod time Tue Oct 6 18:10:49 1981) > ... (inode #74576, mod time Tue Oct 6 18:10:49 1981) > ... (inode #72288, mod time Tue Oct 6 18:10:49 1981) > ... (inode #35680, mod time Tue Oct 6 18:10:49 1981) > Clone multiply-claimed blocks? yes > > Illegal block number passed to ext2fs_test_block_bitmap #3449154175 > for multiply claimed block map > Clone multiply-claimed blocks? yes > > Illegal block number passed to ext2fs_test_block_bitmap #3449154175 > for multiply claimed block map > > I''d appreciate any suggestion anyone could give me? > > 2011/9/19, Larry <tsrjzq at gmail.com>: >> you say the e2fsck enters a endless loop, maybe you don''t give enough >> time for it. By the way, you''d better attach some logs >> >> On 9/18/11, enqiang zhou <eqzhou at gmail.com> wrote: >>> hi,all >>> >>> We experienced a serious raid problem and OST on the RAID corrupted, it >>> could not be mounted. >>> Dmesg showed message as bellow when I tryed to mount it as ldiskfs, >>> >>> LDISKFS-fs error (device sdd): ldiskfs_check_descriptors: Checksum for >>> group >>> 14208 failed (51136!=40578) >>> LDISKFS-fs: group descriptors corrupted! >>> >>> Then I tryed to repair it using e2fsck but entering a endless loop, e2fsck >>> never stop! And I couldn''t mount it as ldiskfs after I sent kill signal to >>> e2fsck. >>> I also tryed some advice found on list, like "tune2fs -O ununit_bg >>> /dev/xxx, >>> then e2fsck", but none could be helpfull. Our lustre version is 1.8.1.1 >>> with >>> e2fsprogs-1.41.10.sun2 >>> Can Mr Andreas Dilger give me some advice? >>> >>> Any help will be greatly appreciated. Thanks! >>> >>> Best Regards >>> >> >
It''s a 8TB LUN and e2fsck have lasted for about 30 hours.I''m not sure if I should wait more enough time for the end.Bellow is part of e2fsck''s log. ... ... Illegal block number passed to ext2fs_test_block_bitmap #4243581855 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #2363489791 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #4091539423 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #3989622221 for multiply claimed block map Illegal block number passed to ext2fs_test_block_bitmap #1339682798 for multiply claimed block map Pass 1C: Scanning directories for inodes with multiply-claimed blocks 19:06 Pass 1D: Reconciling multiply-claimed blocks ... (inode #423776, mod time Tue Oct 6 18:10:49 1981) ... (inode #405328, mod time Tue Oct 6 18:10:49 1981) ... (inode #366432, mod time Tue Oct 6 18:10:49 1981) ... (inode #349536, mod time Tue Oct 6 18:10:49 1981) ... (inode #329824, mod time Tue Oct 6 18:10:49 1981) ... (inode #312928, mod time Tue Oct 6 18:10:49 1981) ... (inode #275296, mod time Tue Oct 6 18:10:49 1981) ... (inode #238688, mod time Tue Oct 6 18:10:49 1981) ... (inode #223056, mod time Tue Oct 6 18:10:49 1981) ... (inode #220768, mod time Tue Oct 6 18:10:49 1981) ... (inode #201056, mod time Tue Oct 6 18:10:49 1981) ... (inode #184160, mod time Tue Oct 6 18:10:49 1981) ... (inode #164448, mod time Tue Oct 6 18:10:49 1981) ... (inode #146528, mod time Tue Oct 6 18:10:49 1981) ... (inode #126816, mod time Tue Oct 6 18:10:49 1981) ... (inode #109920, mod time Tue Oct 6 18:10:49 1981) ... (inode #90208, mod time Tue Oct 6 18:10:49 1981) ... (inode #74576, mod time Tue Oct 6 18:10:49 1981) ... (inode #72288, mod time Tue Oct 6 18:10:49 1981) ... (inode #35680, mod time Tue Oct 6 18:10:49 1981) Clone multiply-claimed blocks? yes Illegal block number passed to ext2fs_test_block_bitmap #3449154175 for multiply claimed block map Clone multiply-claimed blocks? yes Illegal block number passed to ext2fs_test_block_bitmap #3449154175 for multiply claimed block map I''d appreciate any suggestion anyone could give me? 2011/9/18, enqiang zhou <eqzhou at gmail.com>:> hi,all > > We experienced a serious raid problem and OST on the RAID corrupted, it > could not be mounted. > Dmesg showed message as bellow when I tryed to mount it as ldiskfs, > > LDISKFS-fs error (device sdd): ldiskfs_check_descriptors: Checksum for > group > 14208 failed (51136!=40578) > LDISKFS-fs: group descriptors corrupted! > > Then I tryed to repair it using e2fsck but entering a endless loop, e2fsck > never stop! And I couldn''t mount it as ldiskfs after I sent kill signal to > e2fsck. > I also tryed some advice found on list, like "tune2fs -O ununit_bg > /dev/xxx, > then e2fsck", but none could be helpfull. Our lustre version is 1.8.1.1 > with > e2fsprogs-1.41.10.sun2 > Can Mr Andreas Dilger give me some advice? > > Any help will be greatly appreciated. Thanks! > > Best Regards >
Hi Zhou, Make sure you''re running the latest e2fsck http://downloads.whamcloud.com/public/e2fsprogs/latest/ To me, 30 hours seems like a while for 8TB of data, however if I recall, pass 1D can take a very long time on corrupted FS. -cf On 09/19/2011 08:02 AM, enqiang zhou wrote:> It''s a 8TB LUN and e2fsck have lasted for about 30 hours.I''m not sure > if I should wait more enough time for the end.Bellow is part of > e2fsck''s log. > > ... ... > Illegal block number passed to ext2fs_test_block_bitmap #4243581855 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #2363489791 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #4091539423 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #3989622221 > for multiply claimed block map > Illegal block number passed to ext2fs_test_block_bitmap #1339682798 > for multiply claimed block map > Pass 1C: Scanning directories for inodes with multiply-claimed blocks > 19:06 > Pass 1D: Reconciling multiply-claimed blocks > > ... (inode #423776, mod time Tue Oct 6 18:10:49 1981) > ... (inode #405328, mod time Tue Oct 6 18:10:49 1981) > ... (inode #366432, mod time Tue Oct 6 18:10:49 1981) > ... (inode #349536, mod time Tue Oct 6 18:10:49 1981) > ... (inode #329824, mod time Tue Oct 6 18:10:49 1981) > ... (inode #312928, mod time Tue Oct 6 18:10:49 1981) > ... (inode #275296, mod time Tue Oct 6 18:10:49 1981) > ... (inode #238688, mod time Tue Oct 6 18:10:49 1981) > ... (inode #223056, mod time Tue Oct 6 18:10:49 1981) > ... (inode #220768, mod time Tue Oct 6 18:10:49 1981) > ... (inode #201056, mod time Tue Oct 6 18:10:49 1981) > ... (inode #184160, mod time Tue Oct 6 18:10:49 1981) > ... (inode #164448, mod time Tue Oct 6 18:10:49 1981) > ... (inode #146528, mod time Tue Oct 6 18:10:49 1981) > ... (inode #126816, mod time Tue Oct 6 18:10:49 1981) > ... (inode #109920, mod time Tue Oct 6 18:10:49 1981) > ... (inode #90208, mod time Tue Oct 6 18:10:49 1981) > ... (inode #74576, mod time Tue Oct 6 18:10:49 1981) > ... (inode #72288, mod time Tue Oct 6 18:10:49 1981) > ... (inode #35680, mod time Tue Oct 6 18:10:49 1981) > Clone multiply-claimed blocks? yes > > Illegal block number passed to ext2fs_test_block_bitmap #3449154175 > for multiply claimed block map > Clone multiply-claimed blocks? yes > > Illegal block number passed to ext2fs_test_block_bitmap #3449154175 > for multiply claimed block map > > I''d appreciate any suggestion anyone could give me? > > > > 2011/9/18, enqiang zhou <eqzhou at gmail.com>: >> hi,all >> >> We experienced a serious raid problem and OST on the RAID corrupted, it >> could not be mounted. >> Dmesg showed message as bellow when I tryed to mount it as ldiskfs, >> >> LDISKFS-fs error (device sdd): ldiskfs_check_descriptors: Checksum for >> group >> 14208 failed (51136!=40578) >> LDISKFS-fs: group descriptors corrupted! >> >> Then I tryed to repair it using e2fsck but entering a endless loop, e2fsck >> never stop! And I couldn''t mount it as ldiskfs after I sent kill signal to >> e2fsck. >> I also tryed some advice found on list, like "tune2fs -O ununit_bg >> /dev/xxx, >> then e2fsck", but none could be helpfull. Our lustre version is 1.8.1.1 >> with >> e2fsprogs-1.41.10.sun2 >> Can Mr Andreas Dilger give me some advice? >> >> Any help will be greatly appreciated. Thanks! >> >> Best Regards >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss______________________________________________________________________ This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it. Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses. Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA. The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People''s Republic of China and Xyratex Japan Limited registered in Japan. ______________________________________________________________________